'Collapse pandas dataframe based on values of row
I have a dataframe like below.
user cat state
u1 cat1 Y
u1 cat1 N
u1 cat2 Y
u1 cat2 Y
u1 cat3 N
u1 cat3 N
u2 cat1 Y
u2 cat1 N
How would I collapse the rows so to the state 'Y' for a (user, cat) pair as long as one of those pairs is 'Y', and 'N' otherwise?
Result I'm looking for:
user cat state
u1 cat1 Y
u1 cat2 Y
u1 cat3 N
u2 cat1 Y
Solution 1:[1]
You can drop_duplicates and then groupby to ensure at most one Y row and one N row for each (user, cat) subgroup. Then you can check this subgroup to see if a 'Y' exists. The reset_index lets you rename the new column from the groupby back to 'state'.
new_df = df.drop_duplicates().groupby(['user', 'cat']).apply(lambda x: 'Y' if x['state'].eq('Y').any() else 'N').reset_index(name='state')
This produces the expected output:
>>> print(new_df)
user cat state
0 u1 cat1 Y
1 u1 cat2 Y
2 u1 cat3 N
3 u2 cat1 Y
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Ani |
