'How to impute missing values for label and features with sklearn.impute.IterativeImputer

I have a dataset consisting of 11 numerical features and 1 numerical label. There are missing values in features and labels. How can I fit missing values in both with sklearn.impute.IterativeImputer? I'm doing it this way

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Xy_train = X_train.join(y_train)

# The first imputer I'm using to fit and transform training data
iterative_imputer_1 = IterativeImputer(missing_values=-200, estimator=LinearRegression(), max_iter=100, min_value=min_values)
iterative_imputer_1.fit(Xy_train)
transformed_result_1 = iterative_imputer_1.transform(Xy_train) # This is training data transformed

# The second imputer is used to fit on X_train
iterative_imputer_2 = IterativeImputer(missing_values=-200, estimator=LinearRegression(), max_iter=100, min_value=min_values)
iterative_imputer_2.fit(X_train)

# I'm using the second one to transform X_test
transformed_result_2 = iterative_imputer_2.transform(X_test)

But I'm not sure if it is correct. Even If it is correct, It looks like not the best way to implement it. I have doubts in regards to whether I need to create 2 IterativeImputer to transform the same data.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How to impute missing values for label and features with sklearn.impute.IterativeImputer

Sources

Related Questions