'Subtracting unique id rows in pandas

outputI have a dataset like this, I need help with a code that will calculate difference for each id based on type a

so for id 222, value for type a - type a, value for type b - type a, value for type c - type a. and this will go for each unique IDs.

id   type   value   difference
222   a      10.5
222   b      70.0
222   c      148.5
333   a      40.5
333   b      60.2
333   c      180.5
444   a      60.8
444   b      140.2
444   c      210.5


Solution 1:[1]

Solution with unstack

You can start by unstacking on type:

df2 = df.set_index(['id','type']).unstack()

df2 looks like:

        value
type    a   b       c
id          
222    10.5 70.0    148.5
333    40.5 60.2    180.5
444    60.8 140.2   210.5

Then we subtract column a and stack back into the original shape:

df3 = df2.subtract(df2[('value','a')], axis=0).stack().reset_index()

df3 looks like this:

    id  type value
0   222 a    0.0
1   222 b    59.5
2   222 c    138.0
3   333 a    0.0
4   333 b    19.7
5   333 c    140.0
6   444 a    0.0
7   444 b    79.4
8   444 c    149.7

Then you can stick it back into the original df:

df['difference'] = df3['value']

and df now looks like this


    id  type value  difference
0   222 a    10.5   0.0
1   222 b    70.0   59.5
2   222 c    148.5  138.0
3   333 a    40.5   0.0
4   333 b    60.2   19.7
5   333 c    180.5  140.0
6   444 a    60.8   0.0
7   444 b    140.2  79.4
8   444 c    210.5  149.7

Solution with groupby

df['difference'] = df.groupby('id', group_keys = False).apply(lambda g: g - g.set_index('type').loc['a'])['value']

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1