'Find duplicate rows in a group, where another row exists in the same group that matches some condition
I want to find rows which are duplicates according to a certain subset of column values, where those rows also have a zero value in a certain column C and another row exists in the same group with a nonzero value for C.
Test data:
df = pd.DataFrame([
# Should be marked as the combination is available in another
# non-zero price.
{
'colour': 'blue',
'flavour': 'strawberry',
'price': 0
},
{
'colour': 'blue',
'flavour': 'strawberry',
'price': 40,
},
{
'colour': 'blue',
'flavour': 'vanilla',
'price': 50,
},
# Should not be marked as it's the only way to get this
# combination.
{
'colour': 'red',
'flavour': 'strawberry',
'price': 0,
}
]
)
Here the subset of columns is ['colour', 'flavour'] and the column is price.
What I've tried: I have a working version that uses apply() with a function, but I would like a vectorized solution.
What makes intuitive sense to me would be something like this:
nonzero_agg = df[df['price'] != 0].groupby(['colour', 'flavour']).agg({'price': 'max'})
df['marked'] = (df['price'] == 0) & (df[['colour', 'flavour']].isin(nonzero_agg))
But I think this use of isin() doesn't work, I get ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'.
I want the resulting dataframe to look like this:
colour flavour price marked
0 blue strawberry 0 True
1 blue strawberry 40 False
2 blue vanilla 50 False
3 red strawberry 0 False
Solution 1:[1]
Test if price is 0 and at least one value non 0 per groups by GroupBy.transform with GroupBy.any and chain conditions:
m = df['price'] == 0
df['new'] = df.assign(tmp=~m).groupby(['colour', 'flavour'])['tmp'].transform('any') & m
print (df)
colour flavour price new
0 blue strawberry 0 True
1 blue strawberry 40 False
2 blue vanilla 50 False
3 red strawberry 0 False
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jezrael |
