'How to check missing values in each row of dataframe
I have a dataframe with 100's of columns and millions of rows and would like to check the missing values in each row of dataframe.
Code :
df.isna().sum()
Currently, i'm analzing with above code which helps me with missing values in each column. How we can get the missing values w.r.t each row.
Also, distribution plot of [column of rows] vs [number of missing values].
Solution 1:[1]
You can try in a first time to do :
df_nan=pd.DataFrame(df.isna().mean().reset_index()).rename(columns={"index": "columns", 0: "nan_pourcentage"}).sort_values(by='nan_pourcentage',ascending=False)
Just so you can you understand which columns has the most or the less NaN, and you can plot it
You can know the % total of Nan in your dataframe using : df.isna().mean().mean()
And now if you want the % of NaN per line :
for index in range(len(df.index)) :
print("Nan in row ", index , " : " , df.iloc[index].isna().mean())
Instead of using a print you can store the result in a dataframe
Solution 2:[2]
How we can get the missing values w.r.t each row.
You can try sum on columns
df.isna().sum(axis=1)
distribution plot of [column of rows] vs [number of missing values].
If you mean number of missing values in each columns, df.isna().sum() already gives the result.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Ynjxsjmh |
