'My k-fold cross validation technique is giving error on my dataframe with deleted rows

I hope this message finds you well. I have been working with a dataframe and I had to remove the rows which contained any null values. I used the following command to delete such rows. I have used the following command:

df.dropna(axis=0,how="any",inplace=True)

Then when I apply k-fold cross validation like this:

#Using kfold cross validation
from sklearn.model_selection import KFold, cross_val_predict
kf = KFold(shuffle=True, random_state=42, n_splits=5)
for train_index, test_index in kf.split(X):
    X_train, X_test, y_train, y_test = (X.iloc[train_index, :], 
                                        X.iloc[test_index, :], 
                                        y[train_index], 
                                        y[test_index])

I face the following error:

KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Int64Index([    0,   149,   151,   156,   157,\n            ...\n            26474, 26987, 27075, 27157, 27345],\n           dtype='int64', length=1764). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"

I do not know how to fix this. Its probably giving me an error because those rows do not exist and probably I have to reindex them again starting from zero and having proper index. I do not know how to do it. Can anyone suggest any good recommendation? Thanks



Solution 1:[1]

What I think you want is:

for train_index, test_index in kf.split(X):
    
    X_train, X_test, y_train, y_test = (X.iloc[train_index], 
                                        X.iloc[test_index], 
                                        y.iloc[train_index], 
                                        y.iloc[test_index])

I think your problem comes form the fact that you are using relative index number generated by kf.split(X) as index values on y[train_index] and y[test_index]. Your original could - by chance - work if the index in the X and y DF's indexes.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jch