'How to make a custom sum in pandas?

I'm trying to create a running sum on a Dataframe that follows some conditions, sum all previous values with same 'first index current day' value and condiction is true, the pseudocode would be something like

df_ = df[first index current day:current index] #Delimiter dataframe range from begin of day until current row
df_ = df_[df_['toSum'] == True] #drop False
result = sum(df_['sales']) #Sum all sales until now with 'toSum' == True

This is very easy using a loop or even a apply but take too much time, my dataframe have more than 1000k rows, the best I could do until now is use threads but if it's possible to vectorize that algorithm it would be a lot fastter

The result is something like this:

Actual Id	First ID Actual Day	Sales	toSum	AcumulativeDay
0	0	1	True	1
1	0	1	True	2
2	0	1	False	2
3	0	1	False	2
4	0	1	True	3
5	5	1	True	1
6	5	1	True	2
7	5	1	False	2
8	5	1	False	2
9	5	1	True	3

python pandas

Solution 1:^[1]

Use groupby over First ID Actual Day to get all previous values with same first day ID. Then, multiply the boolean column toSum with Sales so that cumsum can interpret the False values as zeros. Remove the indices with droplevel and make the attribution to the variable.

import pandas as pd

d= {'Actual Id': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9},
 'First ID Actual Day': {0: 0,1: 0,2: 0,3: 0,4: 0,5: 5,6: 5,7: 5,8: 5,9: 5},
 'Sales': {0: 1.0,1: 11.33,2: 1.0,3: 31.2,4: 1.0,5: 241.14,6: 1.0,7: 1.0,8: 1.0,9: 54.22},
 'toSum': {0: True,1: True,2: False,3: False,4: True,5: True,6: True,7: False,8: False,9: True}
}
df = pd.DataFrame(d)

df['AcumulativeDay'] = df.groupby(['First ID Actual Day'])\
                        .apply(lambda x: (x['Sales']*x['toSum'])\
                        .cumsum()).droplevel(0)

   Actual Id  First ID Actual Day   Sales  toSum  AcumulativeDay
0          0                    0    1.00   True            1.00
1          1                    0   11.33   True           12.33
2          2                    0    1.00  False           12.33
3          3                    0   31.20  False           12.33
4          4                    0    1.00   True           13.33
5          5                    5  241.14   True          241.14
6          6                    5    1.00   True          242.14
7          7                    5    1.00  False          242.14
8          8                    5    1.00  False          242.14
9          9                    5   54.22   True          296.36

Solution 2:^[2]

After having the table filtered as you did use:

df['cumsum'] = df['sales'].cumsum()

You may want to have the dataframe sorted by date.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	n1colas.m
Solution 2	Ziur Olpa

'How to make a custom sum in pandas?

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]