'How to simplify a pandas dataframe based on treshold value
Here's my dataframe
A B C D
1 0 0.41 0.35 0.61
2 0 0.41 0.35 0
3 0 0.21 0 0
4 0.11 0.4 0.53 0
I want to only display columns or rows that contains value more than 0.5, like this
C D
1 0.35 0.61
4 0.53 0
How suppose I should do that
Solution 1:[1]
Use DataFrame.gt for test greater values with DataFrame.any for test if match at least one value and filter in DataFrame.loc:
m = df.gt(0.5)
df1 = df.loc[m.any(axis=1), m.any()]
print (df1)
C D
1 0.35 0.61
4 0.53 0.00
Solution 2:[2]
You can use a double boolean indexing. Compute a boolean mask with gt. Then check if any value is True on each axis and use this for selection using loc
m = df.gt(0.5)
df.loc[m.any(1), m.any(0)]
output:
C D
1 0.35 0.61
4 0.53 0.00
Intermediates
m:
A B C D
1 False False False True
2 False False False False
3 False False False False
4 False False True False
m.any(1):
1 True
2 False
3 False
4 True
dtype: bool
m.any(0):
A False
B False
C True
D True
dtype: bool
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jezrael |
| Solution 2 |
