'My k-fold cross validation technique is giving error on my dataframe with deleted rows
I hope this message finds you well. I have been working with a dataframe and I had to remove the rows which contained any null values. I used the following command to delete such rows. I have used the following command:
df.dropna(axis=0,how="any",inplace=True)
Then when I apply k-fold cross validation like this:
#Using kfold cross validation
from sklearn.model_selection import KFold, cross_val_predict
kf = KFold(shuffle=True, random_state=42, n_splits=5)
for train_index, test_index in kf.split(X):
X_train, X_test, y_train, y_test = (X.iloc[train_index, :],
X.iloc[test_index, :],
y[train_index],
y[test_index])
I face the following error:
KeyError: "Passing list-likes to .loc or [] with any missing labels is no longer supported. The following labels were missing: Int64Index([ 0, 149, 151, 156, 157,\n ...\n 26474, 26987, 27075, 27157, 27345],\n dtype='int64', length=1764). See https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike"
I do not know how to fix this. Its probably giving me an error because those rows do not exist and probably I have to reindex them again starting from zero and having proper index. I do not know how to do it. Can anyone suggest any good recommendation? Thanks
Solution 1:[1]
What I think you want is:
for train_index, test_index in kf.split(X):
X_train, X_test, y_train, y_test = (X.iloc[train_index],
X.iloc[test_index],
y.iloc[train_index],
y.iloc[test_index])
I think your problem comes form the fact that you are using relative index number generated by kf.split(X) as index values on y[train_index] and y[test_index]. Your original could - by chance - work if the index in the X and y DF's indexes.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jch |
