'Sum values based on two conditions
I have the following DataFrame:
df = pd.DataFrame()
df['account'] = [1,1,1,1,1,2,2,2,2,2]
df['type'] = ['deposit', 'deposit', 'deposit', 'trade', 'trade','deposit', 'deposit', 'trade', 'trade', 'trade']
df['asset'] = ['A','A','B','A','B','A','C','A','A','C']
df['amount'] = [100,110,200,150,230,120,310,130,125,320]
df['price'] = [10,8, 20, 12, 23,15,5, 18, 15, 7]
df:
account type asset amount price
0 1 deposit A 100 10
1 1 deposit A 110 8
2 1 deposit B 200 20
3 1 trade A 150 12
4 1 trade B 230 23
5 2 deposit A 120 15
6 2 deposit C 310 5
7 2 trade A 130 18
8 2 trade A 125 15
9 2 trade C 320 7
I want to create an extra column called holdPortfolio which is basically a buy and hold strategy. So the final DataFrame should look like this:
df_final:
account type asset amount price holdPortfolio
0 1 deposit A 100 10 1000
1 1 deposit A 110 8 1880
2 1 deposit B 200 20 4000
3 1 trade A 150 12 2520
4 1 trade B 230 23 4600
5 2 deposit A 120 15 1800
6 2 deposit C 310 5 1550
7 2 trade A 130 18 2160
8 2 trade A 125 15 1800
9 2 trade C 320 7 2170
Explanation: In index 0, the 1000 comes from 100*10.
In index 1, the 1880 comes from (100*10)+(110*8). I sum them up because they are both asset A.
In index 2, the 4000 comes from 200*20.
In index 3, the 2520 comes from (100+110)*12. This is the total amount of asset A that I have deposited multiplied by the price at that index.
In index 4, the 4600 comes from 200*23. This is the total amount of asset B that I have deposited multiplied by the price at that index. and so on...
Is there a way to get these results without writing multiple for loops for each account and each asset? I basically want a cumulative sum of my deposits for each asset in each account, multiplied by the current price of that asset.
Solution 1:[1]
Compute 2 Series, one for the cumulated deposits and one for the trades, then combine:
s = df['amount'].mask(df['type'].eq('trade'))
s2 = s.fillna(0).groupby([df['account'], df['asset']]).cumsum()
df['holdPortfolio'] = s.mul(df['price']).fillna(s2.mul(df['price']))
Output:
account type asset amount price holdPortfolio
0 1 deposit A 100 10 1000
1 1 deposit A 110 8 880
2 1 deposit B 200 20 4000
3 1 trade A 150 12 2520
4 1 trade B 230 23 4600
5 2 deposit A 120 15 1800
6 2 deposit C 310 5 1550
7 2 trade A 130 18 2160
8 2 trade A 125 15 1800
9 2 trade C 320 7 2170
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
