'Different results with df.dropna(axis='rows') and df.drop(index=np.where(df.isnull().sum()!=0)[0], axis='index')
I am trying to remove all rows that contain NaN values from a dataframe. However, I have realized that using df.dropna(axis='rows'), but using df.drop(index=np.where(df.isnull().sum()!=0)[0], axis='index') gives different result. The former removes fewer rows than the later.
For example, my initial dataframe has 80 cols and 91713 rows.
- If I use
dropna()the resulting dataframe has 80 cols and 91639 rows (e.g. 74 rows were dropped). - If I instead use
drop()the new shape is 80 cols and 56935 rows (e.g. 34778 were dropped).
Is there something wrong with how I am getting the indices to input to df.drop()? I do get 74 columns if I just look at the number of indices I am dropping with that method. E.g with df_nulls = df.iloc[np.where(df.isnull().sum()!=0)[0]], df_nulls.shape[0] is 74.
Update:
I know there is definitely something wrong with the df.drop() method because when I try to run further processing on the data I get errors related to the there still being NaNs. But why would np.where(df.isnull().sum()!=0) not find all the NaN values?
Update 2: it is certainly just something wrong with my indexing (see below), but shouldn't iloc give the rows?
indices_rows_with_nulls = np.where(df.isnull().sum()!=0)[0]
df_nulls = df.iloc[indices_rows_with_nulls]
print('df.shape: '+ str(df.shape)+' df_nulls.shape: '+ str(df_nulls.shape))
indices_rows_without_nulls = np.where(df.isnull().sum()==0)[0]
df_no_nulls = df.iloc[indices_rows_without_nulls]
print('df.shape: '+ str(df.shape)+' df_no_nulls.shape: '+ str(df_no_nulls.shape))
gives
df.shape: (91713, 80) df_nulls.shape: (74, 80)
df.shape: (91713, 80) df_no_nulls.shape: (6, 80)
Solution 1:[1]
You need to sum on columns
df.isnull().sum(axis=1)!=0
# or
df.isnull().sum(axis='columns')!=0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ynjxsjmh |
