'VIF Vs Mutual Info

I was searching for the best ways for feature selection in a regression problem & came across a post suggesting mutual info for regression, I tried the same on boston data set. The results were as follows:


#feature selection
f_selector = SelectKBest(score_func=mutual_info_regression, k='all')

#learning relationship from training data
f_selector.fit(X_train, y_train)

#transform train input data
X_train_fs = f_selector.transform(X_train)

#transform test input data
X_test_fs = f_selector.transform(X_test)

The scores were as follows:

Features    Scores
12  LSTAT   0.651934
5   RM  0.591762
2   INDUS   0.532980
10  PTRATIO 0.490199
4   NOX 0.444421
9   TAX 0.362777
0   CRIM    0.335882
6   AGE 0.334989
7   DIS 0.308023
8   RAD 0.206662
1   ZN  0.197742
11  B   0.172348
3   CHAS    0.027097

I was just curious & mapped the VIF along with scores & I see that the features/Variables with high scores has a very high VIF.

Features    Scores  VIF_Factor
12  LSTAT   0.651934    11.102025
5   RM  0.591762    77.948283
2   INDUS   0.532980    14.485758
10  PTRATIO 0.490199    85.029547
4   NOX 0.444421    73.894947
9   TAX 0.362777    61.227274
0   CRIM    0.335882    2.100373
6   AGE 0.334989    21.386850
7   DIS 0.308023    14.699652
8   RAD 0.206662    15.167725
1   ZN  0.197742    2.844013
11  B   0.172348    20.104943
3   CHAS    0.027097    1.152952

Could you please help in understanding, on how the select the best features among the list.

Thanks in advance!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source