'How to drop entire record if more than 90% of features have missing value in pandas
I have a pandas dataframe called df with 500 columns and 2 million records.
I am able to drop columns that contain more than 90% of missing values.
But how can I drop in pandas the entire record if 90% or more of the columns have missing values across the whole record?
I have seen a similar post for "R" but I am coding in python at the moment.
Solution 1:[1]
You can use df.dropna() and set the thresh parameter to the value that corresponds to 10% of your columns (the minimum number of non-NA values).
df.dropna(axis=0, thresh=50, inplace=True)
Solution 2:[2]
You could use isna + mean on axis=1 to find the percentage of NaN values for each row. Then select the rows where it's less than 0.9 (i.e. 90%) using loc:
out = df.loc[df.isna().mean(axis=1)<0.9]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Niels Henkens |
| Solution 2 |
