'Pandas - new column based on `max` of grouped values
I have a Pandas dataframe with multiple groups in it, A, B, C. Each group has multiple counts associated with it and I want to create a new column that is normalised to the max value of each group.
i.e.
index, group, year, count
0, A, 2015, 1
1, A, 2016, 2
2, A, 2017, 3
3, B, 2012, 10
4, B, 2013, 14
5, B, 2014, 18
6, C, 2014, 55
7, C, 2015, 59
8, C, 2016, 58
...becomes
index, group, year, count, normalised
0, A, 2015, 1, 0.333
1, A, 2016, 2, 0.667
2, A, 2017, 3, 1.000
3, B, 2012, 10, 0.557
4, B, 2013, 14, 0.778
5, B, 2014, 18, 1.000
6, C, 2014, 55, 0.932
7, C, 2015, 59, 1.000
8, C, 2016, 58, 0.983
If I try something like...
df.assign(normalised=lambda x: x['count']/df[df['group'] == x['group']]['count'].max()
then max will return 59 rather than the largest number within the category
Solution 1:[1]
Similar to Psidom's answer, but avoiding the the lambda and therefore faster:
df['normalised'] = df['count']/df.groupby('group')['count'].transform('max')
Timings
>>> %timeit df['normalised'] = df['count']/df.groupby('group')['count'].transform('max')
1.16 ms ± 79.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>>
>>> %timeit df['normalised'] = df['count'].groupby(df.group).transform(lambda x: x / x.max())
1.86 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | timgeb |
