'thresh in dropna for DataFrame in pandas in python

df1 = pd.DataFrame(np.arange(15).reshape(5,3))
df1.iloc[:4,1] = np.nan
df1.iloc[:2,2] = np.nan
df1.dropna(thresh=1 ,axis=1)

It seems that no nan value has been deleted.

    0     1     2
0   0   NaN   NaN
1   3   NaN   NaN
2   6   NaN   8.0
3   9   NaN  11.0
4  12  13.0  14.0

if i run

df1.dropna(thresh=2,axis=1)

why it gives the following?

    0     2
0   0   NaN
1   3   NaN
2   6   8.0
3   9  11.0
4  12  14.0

i just dont understand what thresh is doing here. If a column has more than one nan value, should the column be deleted?



Solution 1:[1]

thresh=N requires that a column has at least N non-NaNs to survive. In the first example, both columns have at least one non-NaN, so both survive. In the second example, only the last column has at least two non-NaNs, so it survives, but the previous column is dropped.

Try setting thresh to 4 to get a better sense of what's happening.

Solution 2:[2]

thresh parameter value decides the minimum number of non-NAN values needed in a "ROW" not to drop.

Solution 3:[3]

This will search along the column and check if the column has atleast 1 non-NaN values:

df1.dropna(thresh=1 ,axis=1)

So the Column name 1 has only one non-NaN value i.e 13 but thresh=2 need atleast 2 non-NaN, so this column failed and it will drop that column:

df1.dropna(thresh=2,axis=1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Akhil teja
Solution 3 Skatox