'How to solve Feature Shape Mismatch error in XGBoost Feature Selection?
I'm trying to use the following code to get feature importance for feature selection in XGBOOST. But I keep getting an error saying that there's a "Feature shape mismatch, expected: 91, got 78".
"select_X_train" has 78 features (columns) as does "select_X_test" so I think the issue is that the thresholds array must still contain the 91 original variables? But, I can't figure out a way to fix it. Can you please advise?
Here is the code:
from numpy import loadtxt
from numpy import sort
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.feature_selection import SelectFromModel
from sklearn.metrics import classification_report
# load data
dataset = df_train
# split data into X and y
X_train = df[df.columns.difference(['Date','IsDeceased','IsTotal','Deceased','Sick','Injured','Displaced','Homeless','MissingPeople','Other','Total'])]
y_train = df['IsDeceased'].values
X_test = df_test[df_test.columns.difference(['Date','IsDeceased','IsTotal','Deceased','Sick','Injured','Displaced','Homeless','MissingPeople','Other','Total'])]
y_test = df_test['IsDeceased'].values
# fit model on all training data
model = XGBClassifier()
model.fit(X_train, y_train)
# make predictions for test data and evaluate
print("Accuracy: %.2f%%" % (accuracy * 100.0))
# Fit model using each importance as a threshold
thresholds = sort(model.feature_importances_)
for thresh in thresholds:
# select features using threshold
selection = SelectFromModel(model, threshold=thresh, prefit=True)
select_X_train = selection.transform(X_train)
# train model
selection_model = XGBClassifier()
selection_model.fit(select_X_train, y_train)
print(thresh)
# eval model
select_X_test = selection.transform(X_test)
y_pred = model.predict(select_X_test)
report = classification_report(y_test,y_pred)
print("Thresh= {} , n= {}\n {}" .format(thresh,select_X_train.shape[1], report))
cm = confusion_matrix(y_test, y_pred)
print(cm)```
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
