'At each NaN value, drop the row and column it's located in from pandas DataFrame

I have some unknown DataFrame that can be of any size and shape, for example:

   first1  first2  first3  first4
a     NaN      22    56.0      65
c   380.0      40     NaN      66
b   390.0      50    80.0      64

My objective is to delete all columns and rows at which there is a NaN value. In this specific case, the output should be:

   first2  first4
b      50      64

Also, I need to preserve the option to use "all" like in pandas.DataFrame.dropna, meaning when an argument "all" passed, a column or a row must be dropped only if all its values are missing.

When I tried the following code:

def dropna_mta_style(df, how='any'):
  new_df = df.dropna(axis=0, how = how).dropna(axis=1, how = how)
  return new_df

It obviously didn't work, because it drops firstly the rows, and then searches for columns with Nan's, but it was already dropped.

Thanks in advance!

P.S: for and while loops, python built-in functions that act on iterables (all, any, map, ...), list and dictionary comprehensions shouldn't be used.



Solution 1:[1]

Would something like this work ?

df.dropna(axis=1,how='any').loc[df.dropna(axis=0,how='any').index]

(Meaning we take the indexes of all rows for which we dont have NaNs in any row df.dropna(axis=0,how='any').index - then use that to locate the rows we want from the original df for which we drop all columns having at least one NaN)

Solution 2:[2]

This should remove all rows and columns dynamically

df['Check'] = df.isin([np.nan]).any(axis=1)
df = df.dropna(axis = 1)
df = df.loc[df['Check'] == False]
df.drop('Check', axis = 1, inplace = True)
df

Solution 3:[3]

Solution intended for readability:

rows = df.dropna(axis=0).index
cols = df.dropna(axis=1).columns
df = df.loc[rows, cols]

Solution 4:[4]

def dropna_mta_style(df, how='any'):
    if (how == 'all'):
        null_col =df.isna().all(axis=0).to_frame(name='col')
        col_names = null_col[null_col['col'] == True].index
        
        null_row =df.isna().all(axis=1).to_frame(name='row')
        row_index = null_row[null_row['row'] == True].index

        if len(row_names) > 0:
            new_df=df.drop(axis=1, columns=col_names)
        
    else:
        new_df = df.dropna(axis=0, how = how).dropna(axis=1, how = how)
    return new_df

here is a breakdown of the change made to the function

BEFORE;

    first1  first2  first3  first4  first5
a   NaN     22.0    NaN     65.0    NaN
c   380.0   40.0    NaN     66.0    NaN
b   390.0   50.0    NaN     64.0    NaN
3   NaN     NaN     NaN     NaN     NaN
4   NaN     NaN     NaN     NaN     NaN
5   NaN     NaN     NaN     NaN     NaN
6   NaN     NaN     NaN     NaN     NaN

find the null columns

null_col =df.isna().all(axis=0).to_frame(name='col')
col_names = null_col[null_col['col'] == True].index
col_names
Index(['first3', 'first5'], dtype='object')

find the rows with all null rows

null_row =df.isna().all(axis=1).to_frame(name='row')
row_index = null_row[null_row['row'] == True].index
row_index
Index([3, 4, 5, 6], dtype='object')
if len(row_names) > 0:
    df2=df.drop(axis=1, columns=col_names)
df2

AFTER:

    first1  first2  first4
a   NaN     22.0    65.0
c   380.0   40.0    66.0
b   390.0   50.0    64.0
3   NaN     NaN     NaN
4   NaN     NaN     NaN
5   NaN     NaN     NaN
6   NaN     NaN     NaN

incorporating in your method

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Daniel Weigel
Solution 2 ArchAngelPwn
Solution 3 Peter Leimbigler
Solution 4 DharmanBot