'Evaluate feature importance after training Logistic Regression model on dataset
I am training a model by LR algorithm on this white-wine dataset ( https://archive.ics.uci.edu/ml/datasets/wine). After having model trained on Python, I printed out model.coef to see level of importance for all model just to notice that "residual sugar" is assigned quite large weight ( 1.3 ). However when looking at correlation matrix ( image below ), the correlation coefficent between independent feature ( residual sugar ) and dependent feature is pretty low compared to other independent features, so I just wonder whether weights assigned are not factors to consider how importance a feature is and if it's not how I evaluate whether a feature is important. Below is also my code, if anything is wrong pls help me correct as I am new to this area
enter code here
engine = create_engine("mysql+mysqlconnector://root:21041996@localhost/mydatabase")
con = engine.connect()
dataframe= pd.read_sql('select * from wine_quality',con)
df = dataframe[dataframe['type']=='white']
seaborn.heatmap(df.corr(),annot= True)
plt.show()
y = df['"quality"']
x = df.drop(columns=['type','"quality"'])
x = x.to_numpy()
y=y.to_numpy()
X_train, X_test, y_train, y_test = train_test_split(x,y,test_size=0.2)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
y_train = y_train>6
y_test = y_test>6
model = LogisticRegression(solver='liblinear',max_iter=2000)
model.fit(X_train,y_train)
print(model.score(X_train,y_train))
print(model.score(X_test,y_test))
print(model.coef_)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

