'Obtaining p-values from categorical variables
I'm going to do backward elimination, but my dataset contains categorical variables, which makes it difficult to extract p-values.
I know there are some other packages like statsmodels and SciPy, but the only one that can handle this type is scikit-learn. The solution that came to my mind was to drop every single column and then calculate the percentage of variation in MSE. However, I don't think it is very practical and scientific. How can I come over with this issue?
I'm new to Python.
Inputs.dropna(axis=0, inplace=True)
Targets = Inputs.ERE_mm
Inputs = Inputs.dropna(axis=1)
Inputs.drop(labels="ERE_mm", inplace=True, axis=1)
from sklearn.preprocessing import LabelEncoder
station_lblencoder = LabelEncoder()
Inputs.Station = station_lblencoder.fit_transform(Inputs.Station)
# *** SENSITIVITY ANALYSIS ***
from sklearn.model_selection import train_test_split as tts
output = {"col": Inputs.columns, "MSE": np.zeros(shape=(Inputs.shape[1] + 1, 1))}
from sklearn.linear_model import LinearRegression
lin_model = LinearRegression()
from sklearn.metrics import mean_squared_error
_Inputs_train, _Inputs_test, _Targets_train, _Targets_test = tts(
Inputs, Targets, train_size=0.8, test_size=0.2, random_state=123
)
output["MSE"][0] = mean_squared_error(
_Targets_test, lin_model.fit(_Inputs_train, _Targets_train).predict(_Inputs_test)
)
# Inputs.drop(labels=["Percepitation_mm", "normal_wind_value", "normal_wind_dir", "max_wind_value"], axis=1, inplace=True)
for _iterator, _col in enumerate(Inputs.columns):
_inputs = Inputs.drop(labels=_col, axis=1, inplace=False)
Inputs_train, Inputs_test, Targets_train, Targets_test = tts(
_inputs, Targets, train_size=0.8, test_size=0.2, random_state=123
)
lin_model.fit(Inputs_train, Targets_train)
predictions = lin_model.predict(Inputs_test)
output["MSE"][_iterator + 1] = (
(mean_squared_error(Targets_test, predictions) - output["MSE"][0])
* 100
/ output["MSE"][0]
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
