'How to impute missing values for label and features with sklearn.impute.IterativeImputer
I have a dataset consisting of 11 numerical features and 1 numerical label. There are missing values in features and labels. How can I fit missing values in both with sklearn.impute.IterativeImputer? I'm doing it this way
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Xy_train = X_train.join(y_train)
# The first imputer I'm using to fit and transform training data
iterative_imputer_1 = IterativeImputer(missing_values=-200, estimator=LinearRegression(), max_iter=100, min_value=min_values)
iterative_imputer_1.fit(Xy_train)
transformed_result_1 = iterative_imputer_1.transform(Xy_train) # This is training data transformed
# The second imputer is used to fit on X_train
iterative_imputer_2 = IterativeImputer(missing_values=-200, estimator=LinearRegression(), max_iter=100, min_value=min_values)
iterative_imputer_2.fit(X_train)
# I'm using the second one to transform X_test
transformed_result_2 = iterative_imputer_2.transform(X_test)
But I'm not sure if it is correct. Even If it is correct, It looks like not the best way to implement it. I have doubts in regards to whether I need to create 2 IterativeImputer to transform the same data.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
