'How to solve Feature Shape Mismatch error in XGBoost Feature Selection?

I'm trying to use the following code to get feature importance for feature selection in XGBOOST. But I keep getting an error saying that there's a "Feature shape mismatch, expected: 91, got 78".

"select_X_train" has 78 features (columns) as does "select_X_test" so I think the issue is that the thresholds array must still contain the 91 original variables? But, I can't figure out a way to fix it. Can you please advise?

Here is the code:

from numpy import loadtxt
from numpy import sort
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.feature_selection import SelectFromModel
from sklearn.metrics import classification_report
# load data
dataset = df_train
# split data into X and y

X_train = df[df.columns.difference(['Date','IsDeceased','IsTotal','Deceased','Sick','Injured','Displaced','Homeless','MissingPeople','Other','Total'])]

y_train = df['IsDeceased'].values


X_test = df_test[df_test.columns.difference(['Date','IsDeceased','IsTotal','Deceased','Sick','Injured','Displaced','Homeless','MissingPeople','Other','Total'])]

y_test = df_test['IsDeceased'].values


# fit model on all training data
model = XGBClassifier()
model.fit(X_train, y_train)
# make predictions for test data and evaluate

print("Accuracy: %.2f%%" % (accuracy * 100.0))
# Fit model using each importance as a threshold
thresholds = sort(model.feature_importances_)
for thresh in thresholds:
    # select features using threshold
    selection = SelectFromModel(model, threshold=thresh, prefit=True)
    select_X_train = selection.transform(X_train)
    # train model
    selection_model = XGBClassifier()
    selection_model.fit(select_X_train, y_train)
    
    print(thresh)
    
    # eval model
    select_X_test = selection.transform(X_test)
    y_pred = model.predict(select_X_test)
    
    report = classification_report(y_test,y_pred)
    print("Thresh= {} , n= {}\n {}" .format(thresh,select_X_train.shape[1], report))
    cm = confusion_matrix(y_test, y_pred)
    print(cm)```
    
    


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source