'Obtaining p-values from categorical variables

I'm going to do backward elimination, but my dataset contains categorical variables, which makes it difficult to extract p-values.

I know there are some other packages like statsmodels and SciPy, but the only one that can handle this type is scikit-learn. The solution that came to my mind was to drop every single column and then calculate the percentage of variation in MSE. However, I don't think it is very practical and scientific. How can I come over with this issue?

I'm new to Python.

Inputs.dropna(axis=0, inplace=True)
Targets = Inputs.ERE_mm
Inputs = Inputs.dropna(axis=1)
Inputs.drop(labels="ERE_mm", inplace=True, axis=1)


from sklearn.preprocessing import LabelEncoder

station_lblencoder = LabelEncoder()
Inputs.Station = station_lblencoder.fit_transform(Inputs.Station)

#                   *** SENSITIVITY ANALYSIS ***

from sklearn.model_selection import train_test_split as tts

output = {"col": Inputs.columns, "MSE": np.zeros(shape=(Inputs.shape[1] + 1, 1))}

from sklearn.linear_model import LinearRegression

lin_model = LinearRegression()

from sklearn.metrics import mean_squared_error

_Inputs_train, _Inputs_test, _Targets_train, _Targets_test = tts(
    Inputs, Targets, train_size=0.8, test_size=0.2, random_state=123
)

output["MSE"][0] = mean_squared_error(
    _Targets_test, lin_model.fit(_Inputs_train, _Targets_train).predict(_Inputs_test)
)

# Inputs.drop(labels=["Percepitation_mm", "normal_wind_value", "normal_wind_dir", "max_wind_value"], axis=1, inplace=True)

for _iterator, _col in enumerate(Inputs.columns):
    _inputs = Inputs.drop(labels=_col, axis=1, inplace=False)

    Inputs_train, Inputs_test, Targets_train, Targets_test = tts(
        _inputs, Targets, train_size=0.8, test_size=0.2, random_state=123
    )

    lin_model.fit(Inputs_train, Targets_train)

    predictions = lin_model.predict(Inputs_test)

    output["MSE"][_iterator + 1] = (
        (mean_squared_error(Targets_test, predictions) - output["MSE"][0])
        * 100
        / output["MSE"][0]
    )

python

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Obtaining p-values from categorical variables

Sources

Related Questions