'Add sum columns back to pandas dataframe through chain style

I am trying to add a list of column sums back to my pandas dataframe through assign(). But I am not so sure how to do it when there's more than one column. What's the best way to do it or any other way to do it in a chain style given I have had other steps before?

data2.assign(data2[rate_name].abs() / data2.groupby(level = 'date')[rate_name].transform('sum'))

                   rate_water  rate_fire  rate_wood
id     date                                        
apple  2019-01-01   -0.500000  -0.500000   0.000000
orange 2019-01-01   -0.636364  -0.963636   3.000000
melon  2019-01-01   -0.333333   5.666667  27.888889
apple  2020-01-01   -0.100000   7.900000  76.000000
orange 2020-01-01    0.363636  -0.963636  26.500000
melon  2020-01-01    0.166667   6.166667  27.235043
apple  2021-01-01    0.328571  26.261702  84.220779
orange 2021-01-01    0.363636  28.036364  28.683673
melon  2021-01-01    0.611111  39.944444  27.679487

Reproducible:

from pandas import Timestamp
data2 = pd.DataFrame.from_dict({'rate_water': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.6363636363636364, ('melon', Timestamp('2019-01-01 00:00:00')): -0.33333333333333337, ('apple', Timestamp('2020-01-01 00:00:00')): -0.10000000000000009, ('orange', Timestamp('2020-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2020-01-01 00:00:00')): 0.16666666666666663, ('apple', Timestamp('2021-01-01 00:00:00')): 0.3285714285714285, ('orange', Timestamp('2021-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2021-01-01 00:00:00')): 0.611111111111111}, 'rate_fire': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2019-01-01 00:00:00')): 5.666666666666667, ('apple', Timestamp('2020-01-01 00:00:00')): 7.9, ('orange', Timestamp('2020-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2020-01-01 00:00:00')): 6.166666666666667, ('apple', Timestamp('2021-01-01 00:00:00')): 26.261702127659575, ('orange', Timestamp('2021-01-01 00:00:00')): 28.036363636363635, ('melon', Timestamp('2021-01-01 00:00:00')): 39.94444444444444}, 'rate_wood': {('apple', Timestamp('2019-01-01 00:00:00')): 0.0, ('orange', Timestamp('2019-01-01 00:00:00')): 3.0, ('melon', Timestamp('2019-01-01 00:00:00')): 27.88888888888889, ('apple', Timestamp('2020-01-01 00:00:00')): 76.0, ('orange', Timestamp('2020-01-01 00:00:00')): 26.5, ('melon', Timestamp('2020-01-01 00:00:00')): 27.235042735042736, ('apple', Timestamp('2021-01-01 00:00:00')): 84.22077922077922, ('orange', Timestamp('2021-01-01 00:00:00')): 28.683673469387756, ('melon', Timestamp('2021-01-01 00:00:00')): 27.67948717948718}})

                   rate_water  rate_fire  rate_wood  sum_water  sum_fire    sum_wood
id     date                                                                         
apple  2019-01-01   -0.500000  -0.500000   0.000000  -1.469697   4.20303   30.888889
orange 2019-01-01   -0.636364  -0.963636   3.000000  -1.469697   4.20303   30.888889
melon  2019-01-01   -0.333333   5.666667  27.888889  -1.469697   4.20303   30.888889
apple  2020-01-01   -0.100000   7.900000  76.000000   0.430303  13.10303  129.735043
orange 2020-01-01    0.363636  -0.963636  26.500000   0.430303  13.10303  129.735043
melon  2020-01-01    0.166667   6.166667  27.235043   0.430303  13.10303  129.735043
apple  2021-01-01    0.328571  26.261702  84.220779   1.303319  94.24251  140.583940
orange 2021-01-01    0.363636  28.036364  28.683673   1.303319  94.24251  140.583940
melon  2021-01-01    0.611111  39.944444  27.679487   1.303319  94.24251  140.583940

python pandas

Solution 1:^[1]

Use dictionary comprehension for dict of Series and add it to dataFrame with unpack **:

from pandas import Timestamp
data2 = pd.DataFrame.from_dict({'rate_water': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.6363636363636364, ('melon', Timestamp('2019-01-01 00:00:00')): -0.33333333333333337, ('apple', Timestamp('2020-01-01 00:00:00')): -0.10000000000000009, ('orange', Timestamp('2020-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2020-01-01 00:00:00')): 0.16666666666666663, ('apple', Timestamp('2021-01-01 00:00:00')): 0.3285714285714285, ('orange', Timestamp('2021-01-01 00:00:00')): 0.36363636363636365, ('melon', Timestamp('2021-01-01 00:00:00')): 0.611111111111111}, 'rate_fire': {('apple', Timestamp('2019-01-01 00:00:00')): -0.5, ('orange', Timestamp('2019-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2019-01-01 00:00:00')): 5.666666666666667, ('apple', Timestamp('2020-01-01 00:00:00')): 7.9, ('orange', Timestamp('2020-01-01 00:00:00')): -0.9636363636363636, ('melon', Timestamp('2020-01-01 00:00:00')): 6.166666666666667, ('apple', Timestamp('2021-01-01 00:00:00')): 26.261702127659575, ('orange', Timestamp('2021-01-01 00:00:00')): 28.036363636363635, ('melon', Timestamp('2021-01-01 00:00:00')): 39.94444444444444}, 'rate_wood': {('apple', Timestamp('2019-01-01 00:00:00')): 0.0, ('orange', Timestamp('2019-01-01 00:00:00')): 3.0, ('melon', Timestamp('2019-01-01 00:00:00')): 27.88888888888889, ('apple', Timestamp('2020-01-01 00:00:00')): 76.0, ('orange', Timestamp('2020-01-01 00:00:00')): 26.5, ('melon', Timestamp('2020-01-01 00:00:00')): 27.235042735042736, ('apple', Timestamp('2021-01-01 00:00:00')): 84.22077922077922, ('orange', Timestamp('2021-01-01 00:00:00')): 28.683673469387756, ('melon', Timestamp('2021-01-01 00:00:00')): 27.67948717948718}})
data2.index.names=['id','date']

cols = ['rate_water','rate_fire','rate_wood']
data2 = data2.assign(**{rate_name.replace('rate','sum'): 
                        data2[rate_name].abs() / data2.groupby(level = 'date')[rate_name].transform('sum') 
                        for rate_name in cols})
print (data2)
                   rate_water  rate_fire  rate_wood  sum_water  sum_fire  \
id     date                                                                
apple  2019-01-01   -0.500000  -0.500000   0.000000  -0.340206  0.118962   
orange 2019-01-01   -0.636364  -0.963636   3.000000  -0.432990  0.229272   
melon  2019-01-01   -0.333333   5.666667  27.888889  -0.226804  1.348234   
apple  2020-01-01   -0.100000   7.900000  76.000000   0.232394  0.602914   
orange 2020-01-01    0.363636  -0.963636  26.500000   0.845070  0.073543   
melon  2020-01-01    0.166667   6.166667  27.235043   0.387324  0.470629   
apple  2021-01-01    0.328571  26.261702  84.220779   0.252104  0.278661   
orange 2021-01-01    0.363636  28.036364  28.683673   0.279008  0.297492   
melon  2021-01-01    0.611111  39.944444  27.679487   0.468888  0.423847   

                   sum_wood  
id     date                  
apple  2019-01-01  0.000000  
orange 2019-01-01  0.097122  
melon  2019-01-01  0.902878  
apple  2020-01-01  0.585809  
orange 2020-01-01  0.204262  
melon  2020-01-01  0.209928  
apple  2021-01-01  0.599078  
orange 2021-01-01  0.204032  
melon  2021-01-01  0.196889

Another way is processing all columns together:

cols = ['rate_water','rate_fire','rate_wood']
data2 = data2.join(data2[cols].abs().div(data2.groupby(level = 'date')[cols].transform('sum') )
                     .rename(columns=lambda x: x.replace('rate','sum')))

cols = ['rate_water','rate_fire','rate_wood']
data2 = data2.assign(**data2[cols].abs().div(data2.groupby(level = 'date')[cols].transform('sum') )
                       .rename(columns=lambda x: x.replace('rate','sum')))
print (data2)
                   rate_water  rate_fire  rate_wood  sum_water  sum_fire  \
id     date                                                                
apple  2019-01-01   -0.500000  -0.500000   0.000000  -0.340206  0.118962   
orange 2019-01-01   -0.636364  -0.963636   3.000000  -0.432990  0.229272   
melon  2019-01-01   -0.333333   5.666667  27.888889  -0.226804  1.348234   
apple  2020-01-01   -0.100000   7.900000  76.000000   0.232394  0.602914   
orange 2020-01-01    0.363636  -0.963636  26.500000   0.845070  0.073543   
melon  2020-01-01    0.166667   6.166667  27.235043   0.387324  0.470629   
apple  2021-01-01    0.328571  26.261702  84.220779   0.252104  0.278661   
orange 2021-01-01    0.363636  28.036364  28.683673   0.279008  0.297492   
melon  2021-01-01    0.611111  39.944444  27.679487   0.468888  0.423847   

                   sum_wood  
id     date                  
apple  2019-01-01  0.000000  
orange 2019-01-01  0.097122  
melon  2019-01-01  0.902878  
apple  2020-01-01  0.585809  
orange 2020-01-01  0.204262  
melon  2020-01-01  0.209928  
apple  2021-01-01  0.599078  
orange 2021-01-01  0.204032  
melon  2021-01-01  0.196889

Solution 2:^[2]

One option is with the assign method and unpacking:

 data2.assign(**data2
                .abs()
                .div(
                data2.groupby('date')
                .transform('sum'))
                .rename(columns = lambda df: df.removeprefix('rate_'))
                .add_prefix('sum_'))

                    rate_water  rate_fire  rate_wood  sum_water  sum_fire  sum_wood
id     date
apple  2019-01-01   -0.500000  -0.500000   0.000000  -0.340206  0.118962  0.000000
orange 2019-01-01   -0.636364  -0.963636   3.000000  -0.432990  0.229272  0.097122
melon  2019-01-01   -0.333333   5.666667  27.888889  -0.226804  1.348234  0.902878
apple  2020-01-01   -0.100000   7.900000  76.000000   0.232394  0.602914  0.585809
orange 2020-01-01    0.363636  -0.963636  26.500000   0.845070  0.073543  0.204262
melon  2020-01-01    0.166667   6.166667  27.235043   0.387324  0.470629  0.209928
apple  2021-01-01    0.328571  26.261702  84.220779   0.252104  0.278661  0.599078
orange 2021-01-01    0.363636  28.036364  28.683673   0.279008  0.297492  0.204032
melon  2021-01-01    0.611111  39.944444  27.679487   0.468888  0.423847  0.196889

Another option would be to concatenate along axis=1:

 pd.concat([data2,
           data2.abs()
                .div(groupby('date')
                .transform('sum'))
                .rename(columns = lambda df: df.removeprefix('rate_'))
                .add_prefix('sum_')], 
           axis = 1)
Out[103]:
                    rate_water  rate_fire  rate_wood  sum_water  sum_fire  sum_wood
id     date
apple  2019-01-01   -0.500000  -0.500000   0.000000  -0.340206  0.118962  0.000000
orange 2019-01-01   -0.636364  -0.963636   3.000000  -0.432990  0.229272  0.097122
melon  2019-01-01   -0.333333   5.666667  27.888889  -0.226804  1.348234  0.902878
apple  2020-01-01   -0.100000   7.900000  76.000000   0.232394  0.602914  0.585809
orange 2020-01-01    0.363636  -0.963636  26.500000   0.845070  0.073543  0.204262
melon  2020-01-01    0.166667   6.166667  27.235043   0.387324  0.470629  0.209928
apple  2021-01-01    0.328571  26.261702  84.220779   0.252104  0.278661  0.599078
orange 2021-01-01    0.363636  28.036364  28.683673   0.279008  0.297492  0.204032
melon  2021-01-01    0.611111  39.944444  27.679487   0.468888  0.423847  0.196889

I feel though that it would probably be cleaner to create temporary variables instead, especially if this is going to be in production code

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2

'Add sum columns back to pandas dataframe through chain style

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]