'Drop the entire row if a particular sub-row does not fulfill condition
I have a pandas df with subentries. I would like to make a condition for a particular subentry, and if this condition is not fulfilled, I would like to drop the entire row so I could update the df.
For example, I would like to check each subentry 0 for all the entries and give a condition that if pt<120 then drop the entire entry.
pt
entry subentry
0 0 100
1 200
2 300
1 0 200
1 300
2 0 80
1 300
3 400
4 300
... ... ...
So, the entry 0 and 2 (with all the subentries) should be deleted.
pt
entry subentry
1 0 200
1 300
... ... ...
I tried using:
df.loc[(slice(None), 0), :]["pt"]>100
but it creates a new series and I cannot pass it to the original df because it does not match the entries/subentries. Thank you.
Solution 1:[1]
Try this:
# Count the number of invalid `pt` per `entry`
invalid = df['pt'].lt(120).groupby(df['entry']).sum()
# Valid `entry` are those whose `invalid` count is 0
df[df['entry'].isin(invalid[invalid == 0].index)]
Solution 2:[2]
One solution is to groupby "entry" and then calculate using transform to create a minimum that can then be used with loc to index the correct rows
df = pd.DataFrame({'entry': [0, 0, 1, 1, 2, 2],
'subentry': [1, 2, 1, 2, 1, 2],
'pt': [100, 300, 200, 300, 80, 300]})
Initial df:
entry subentry pt
0 0 1 100
1 0 2 300
2 1 1 200
3 1 2 300
4 2 1 80
5 2 2 300
Use loc to find select only the rows matching conditional:
df.loc[df.groupby('entry').transform('min')['pt']>120]
Output:
entry subentry pt
2 1 1 200
3 1 2 300
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Code Different |
| Solution 2 | bkeesey |
