'How to have an average of features importance?
Here's my code :
from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestRegressor
from matplotlib import pyplot
model = RandomForestRegressor()
# fit the model
model.fit(X, y)
# get importance
importance = model.feature_importances_
# summarize feature importance
(pd.Series(model.feature_importances_, index=X.columns)
.nlargest(10)
.plot(kind='barh'))
But the results may vary given the stochastic nature of the algorithm. I tried several times and yes it vary, not greatly but it vary.
So I would like some kind of loop (like running 10 times the RF) and have the same output but the average (or the median) of the features importances in the barplot.
Like a cross-validation but for features importance.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
