'Subtracting unique id rows in pandas
I have a dataset like this, I need help with a code that will calculate difference for each id based on type a
so for id 222, value for type a - type a, value for type b - type a, value for type c - type a. and this will go for each unique IDs.
id type value difference
222 a 10.5
222 b 70.0
222 c 148.5
333 a 40.5
333 b 60.2
333 c 180.5
444 a 60.8
444 b 140.2
444 c 210.5
Solution 1:[1]
Solution with unstack
You can start by unstacking on type:
df2 = df.set_index(['id','type']).unstack()
df2 looks like:
value
type a b c
id
222 10.5 70.0 148.5
333 40.5 60.2 180.5
444 60.8 140.2 210.5
Then we subtract column a and stack back into the original shape:
df3 = df2.subtract(df2[('value','a')], axis=0).stack().reset_index()
df3 looks like this:
id type value
0 222 a 0.0
1 222 b 59.5
2 222 c 138.0
3 333 a 0.0
4 333 b 19.7
5 333 c 140.0
6 444 a 0.0
7 444 b 79.4
8 444 c 149.7
Then you can stick it back into the original df:
df['difference'] = df3['value']
and df now looks like this
id type value difference
0 222 a 10.5 0.0
1 222 b 70.0 59.5
2 222 c 148.5 138.0
3 333 a 40.5 0.0
4 333 b 60.2 19.7
5 333 c 180.5 140.0
6 444 a 60.8 0.0
7 444 b 140.2 79.4
8 444 c 210.5 149.7
Solution with groupby
df['difference'] = df.groupby('id', group_keys = False).apply(lambda g: g - g.set_index('type').loc['a'])['value']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
