'Python Dataframe delete rows after comparing multiple column values with a value
I have data frame of many columns consisting float values. I want to delete a row if any of the columns have value below 20.
code:
xdf = pd.DataFrame({'A':np.random.uniform(low=-50, high=53.3, size=(5)),'B':np.random.uniform(low=10, high=130, size=(5)),'C':np.random.uniform(low=-50, high=130, size=(5)),'D':np.random.uniform(low=-100, high=200, size=(5))})
xdf =
A B C D
0 -9.270533 42.098425 91.125009 148.350655
1 17.771411 55.564825 106.396381 -89.082831
2 -22.602563 99.330643 17.590466 73.985202
3 15.890920 76.011631 52.366311 194.023063
4 35.202379 41.973846 32.576890 100.523902
# my code
xdf[xdf[cols].ge(20).all(axis=1)]
Out[17]:
A B C D
4 35.202379 41.973846 32.57689 100.523902
Expected output: drop a row if any column has below 20 value
xdf =
A B C D
4 35.202379 41.973846 32.576890 100.523902
Is this the best way of doing it?
Solution 1:[1]
As numpy is lighter and therefore faster in terms of calculations with numbers, try this:
a = np.array([np.random.uniform(low=-50, high=53.3, size=(5)),
np.random.uniform(low=10, high=130, size=(5)),
np.random.uniform(low=-50, high=130, size=(5)),
np.random.uniform(low=-100, high=200, size=(5))])
print(a[np.all(a > 20, axis=1)])
If you want to stick with pandas, another idea would be:
xdfFiltered = xdf.loc[(xdf["A"] > 20) & (xdf["B"] > 20) & (xdf["C"] > 20) & (xdf["D"] > 20)]
Solution 2:[2]
You can use the numpy equivalent of .ge instead:
xdf.loc[np.greater(xdf,20).all(axis=1)]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Daniel Seger |
| Solution 2 |
