'At each NaN value, drop the row and column it's located in from pandas DataFrame
I have some unknown DataFrame that can be of any size and shape, for example:
first1 first2 first3 first4
a NaN 22 56.0 65
c 380.0 40 NaN 66
b 390.0 50 80.0 64
My objective is to delete all columns and rows at which there is a NaN value. In this specific case, the output should be:
first2 first4
b 50 64
Also, I need to preserve the option to use "all" like in pandas.DataFrame.dropna, meaning when an argument "all" passed, a column or a row must be dropped only if all its values are missing.
When I tried the following code:
def dropna_mta_style(df, how='any'):
new_df = df.dropna(axis=0, how = how).dropna(axis=1, how = how)
return new_df
It obviously didn't work, because it drops firstly the rows, and then searches for columns with Nan's, but it was already dropped.
Thanks in advance!
P.S: for and while loops, python built-in functions that act on iterables (all, any, map, ...), list and dictionary comprehensions shouldn't be used.
Solution 1:[1]
Would something like this work ?
df.dropna(axis=1,how='any').loc[df.dropna(axis=0,how='any').index]
(Meaning we take the indexes of all rows for which we dont have NaNs in any row df.dropna(axis=0,how='any').index - then use that to locate the rows we want from the original df for which we drop all columns having at least one NaN)
Solution 2:[2]
This should remove all rows and columns dynamically
df['Check'] = df.isin([np.nan]).any(axis=1)
df = df.dropna(axis = 1)
df = df.loc[df['Check'] == False]
df.drop('Check', axis = 1, inplace = True)
df
Solution 3:[3]
Solution intended for readability:
rows = df.dropna(axis=0).index
cols = df.dropna(axis=1).columns
df = df.loc[rows, cols]
Solution 4:[4]
def dropna_mta_style(df, how='any'):
if (how == 'all'):
null_col =df.isna().all(axis=0).to_frame(name='col')
col_names = null_col[null_col['col'] == True].index
null_row =df.isna().all(axis=1).to_frame(name='row')
row_index = null_row[null_row['row'] == True].index
if len(row_names) > 0:
new_df=df.drop(axis=1, columns=col_names)
else:
new_df = df.dropna(axis=0, how = how).dropna(axis=1, how = how)
return new_df
here is a breakdown of the change made to the function
BEFORE;
first1 first2 first3 first4 first5
a NaN 22.0 NaN 65.0 NaN
c 380.0 40.0 NaN 66.0 NaN
b 390.0 50.0 NaN 64.0 NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN
find the null columns
null_col =df.isna().all(axis=0).to_frame(name='col')
col_names = null_col[null_col['col'] == True].index
col_names
Index(['first3', 'first5'], dtype='object')
find the rows with all null rows
null_row =df.isna().all(axis=1).to_frame(name='row')
row_index = null_row[null_row['row'] == True].index
row_index
Index([3, 4, 5, 6], dtype='object')
if len(row_names) > 0:
df2=df.drop(axis=1, columns=col_names)
df2
AFTER:
first1 first2 first4
a NaN 22.0 65.0
c 380.0 40.0 66.0
b 390.0 50.0 64.0
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
6 NaN NaN NaN
incorporating in your method
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Daniel Weigel |
| Solution 2 | ArchAngelPwn |
| Solution 3 | Peter Leimbigler |
| Solution 4 | DharmanBot |
