'Why Multioutput XGBoosting feature importance gives different results using importnace_plot or estimators_[0].feature_importances_?
I have a multioutput XGboosting model and trying to plot important features for each output. There are 23 outputs.
I have tried to do this from two ways:
important features as a dataframe:
# Get features for the first output as numpy array. Can change number [0,22]
features = multioutputregressor.estimators_[0].feature_importances_
# Convert features to dataframe and corresponding feature names
wo_interaction_terms = pd.DataFrame(features, index=list(X_train.columns()),\
columns=['importance']).sort_values('importance', ascending=False)
important features as a bar plot in a for loop to get for all 23 outputs :
f = 0
fig, ax = plt.subplots(5,5,figsize=(12, 18))
for i in range(5):
for j in range(5):
plot_importance(multioutputregressor.estimators_[f], height=0.2, ax=ax[i, j], title=output_cols[f])
f += 1
fig.tight_layout()
The output for the first approach gives the following result for output 0:
the plot from the second approach generate a different set of important features and the values are also different from what you see in the first image.
f22 is not "Lead" or f0 is not "Gaseous CO2" and so on.
Questions: 1- Plot_importance uses F score but what .estimators_[0].feature_importances_ uses as the criteria? the numbers are obviously different?
2- How add feature names to the plots? I saw other posts like here but it dsnt work for multioutput XGBoosting. what the options are for this case?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|


