'How to fill missing value with KNN in python
I'm trying to fill missing values with KNN in python so I wrote this code but it doesn't work . I get this error "ValueError: could not convert string to float: 'normal'" .what should I do?
import pandas as pd
df = pd.read_csv(r'df.csv')
from sklearn.impute import KNNImputer
imputer = KNNImputer(n_neighbors=5)
df = pd.DataFrame(imputer.fit_transform(df),columns = df.columns)
Solution 1:[1]
Usually to replace NaN values, we use the sklearn.impute.SimpleImputer which can replace NaN values with the value of your choice (mean , median of the sample, or any other value you would like).
from sklearn.impute import SimpleImputer
imp = SimpleImputer(missing_values=np.nan, strategy='mean')
df = imputer.fit_transform(df)
Solution 2:[2]
I do not know how your df look like but I guess you might be have to use Ordinal or Label Encoders as KNN imputer does not work with text data.
Here is a guide for you:
https://medium.com/@kyawsawhtoon/a-guide-to-knn-imputation-95e2dc496e
Solution 3:[3]
The KNN method will compute the distance between vectors, so if your data is categorical, you should convert it to numerical. For example, if the string stands labels, you could use one-hot to encode the labels.
There is another python package that implements KNN imputation method: impyte
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | AlexTorx |
| Solution 2 | giraycoskun |
| Solution 3 | LittleHealth |
