'Pandas - new column based on `max` of grouped values

I have a Pandas dataframe with multiple groups in it, A, B, C. Each group has multiple counts associated with it and I want to create a new column that is normalised to the max value of each group.

i.e.

index, group, year, count
0, A, 2015, 1
1, A, 2016, 2
2, A, 2017, 3
3, B, 2012, 10
4, B, 2013, 14
5, B, 2014, 18
6, C, 2014, 55
7, C, 2015, 59
8, C, 2016, 58

...becomes

index, group, year, count, normalised
0, A, 2015, 1,  0.333
1, A, 2016, 2,  0.667
2, A, 2017, 3,  1.000
3, B, 2012, 10, 0.557
4, B, 2013, 14, 0.778
5, B, 2014, 18, 1.000
6, C, 2014, 55, 0.932
7, C, 2015, 59, 1.000
8, C, 2016, 58, 0.983

If I try something like...

df.assign(normalised=lambda x: x['count']/df[df['group'] == x['group']]['count'].max()

then max will return 59 rather than the largest number within the category



Solution 1:[1]

Similar to Psidom's answer, but avoiding the the lambda and therefore faster:

df['normalised'] = df['count']/df.groupby('group')['count'].transform('max')

Timings

>>> %timeit df['normalised'] = df['count']/df.groupby('group')['count'].transform('max')                                         
1.16 ms ± 79.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>>                                                                                                                              
>>> %timeit df['normalised'] = df['count'].groupby(df.group).transform(lambda x: x / x.max())                                    
1.86 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 timgeb