'Collapse pandas dataframe based on values of row

I have a dataframe like below.

user cat  state
u1   cat1 Y
u1   cat1 N
u1   cat2 Y
u1   cat2 Y
u1   cat3 N
u1   cat3 N
u2   cat1 Y
u2   cat1 N

How would I collapse the rows so to the state 'Y' for a (user, cat) pair as long as one of those pairs is 'Y', and 'N' otherwise?

Result I'm looking for:

user cat  state
u1   cat1 Y
u1   cat2 Y
u1   cat3 N
u2   cat1 Y


Solution 1:[1]

You can drop_duplicates and then groupby to ensure at most one Y row and one N row for each (user, cat) subgroup. Then you can check this subgroup to see if a 'Y' exists. The reset_index lets you rename the new column from the groupby back to 'state'.

new_df = df.drop_duplicates().groupby(['user', 'cat']).apply(lambda x: 'Y' if x['state'].eq('Y').any() else 'N').reset_index(name='state')

This produces the expected output:

>>> print(new_df)
  user   cat state
0   u1  cat1     Y
1   u1  cat2     Y
2   u1  cat3     N
3   u2  cat1     Y

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ani