'How to cross validate different filling methods for missing values?

I have a dataset with missing values which I like to fill. I would like to this with different methods which I then would like to compare to see which one shows the best performance. I am new to this kind of problem and was now thinking to best make a comparison using some test and training data using sklearn. I would like to get some statistical meaningful parameters on which I could then make a educated decision which method I wanna chose for my data.

My original data has over 60'000 rows and looks as follows:

datetime               | A   |
-----------------------|-----|
07/12/2014  01:00:00   | 102 |
07/12/2014  02:00:00   |  Na |
07/12/2014  03:00:00   |  12 |
07/12/2014  04:00:00   |  98 |
07/12/2014  05:00:00   |  Na |
07/12/2014  06:00:00   |  34 |

My code so far looks something like this:

from sklearn.impute import IterativeImputer
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt 
import pandas as pd
import numpy as np

df = load_data(data_dir, file_gen_test)
df_only_values = df[~df['A'].isna()]
df_only_values['month'] = df_only_values['Datetime'].dt.month
df_only_values['hour']  = df_only_values['Datetime'].dt.hour
df_only_values['year']  = df_only_values['Datetime'].dt.year

TargetVariable = ['A']
Predictors = ['month','hour','year']
X = df_only_values[Predictors].values
y = df_only_values[TargetVariable].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0)

df["forward"] = df['A'].ffill(axis=0)
df["backward"] = df['A'].bfill(axis=0)
f["linear"]  = df['A'].interpolate()
df["barycentric"]  = df['A'].interpolate(method='barycentric')

imp = IterativeImputer(max_iter=10, random_state=0)
imp.fit(X_test, y_test)

My question is now two fold. How can should I pass the values to this different methods as some are acting only on "Na" and some cannot accept "Na"? And can I compare this different approaches in the best way? I am aware that I probably made some stupid mistakes and would be really glad if you could point them out and success another approach as I am still a newby.

Many thanks already in advance for all your help.

Best fidu13



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source