'Pandas sum rows between boolean values of another column
I am trying to sum the values of all the rows (for multiple columns in reality around 50), between the True values of a bool column in a pandas df, and fill the True rows with the sum number. I had a column with names and based on whether the names contained a substring, I made a mask boolean column. Let me give a visual example.
Current df:
| mask | col1 | col2 | col3 |
|---|---|---|---|
| False | 10 | 3 | 5 |
| False | 5 | 2 | 4 |
| True | |||
| False | 5 | 1 | 10 |
| False | 1 | 7 | 6 |
| False | 8 | 2 | 4 |
| True |
Desired/Target df:
| mask | col1 | col2 | col3 |
|---|---|---|---|
| False | 10 | 3 | 5 |
| False | 5 | 2 | 4 |
| True | 15 | 5 | 9 |
| False | 5 | 1 | 10 |
| False | 1 | 7 | 6 |
| False | 8 | 2 | 4 |
| True | 14 | 10 | 20 |
I made it with for loops iterating through the df (more "classic programming") but it takes forever cause the df usually is millions of lines. I am looking for a way to do this with pandas since it's very fast.
Also I tried this which I found in another post: (though I am not familiar with groupby and transform so I don't actually know what I did)
col_list = [x for x in df.columns if 'col' in x]
df[col_list] = df.groupby(df['mask'].cumsum()).transfrom('sum').where(df.mask)
But I got an error "TypeError: can only concatenate str (not "float") to str"
(the actual numbers in the dataframe are floats but I used integers in the example just to make things easy)
Any ideas? Thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
