'Groupby conditional sum of adjacent rows pandas
I have a dataframe, which has been sorted by user and by time
df = pd.DataFrame({'user' : ['A', 'A', 'A', 'B', 'B', 'B','B'],
'location' : ['house','house','gym','gym','shop','gym','gym'],
'duration':[10,5,5,4,10,4,6]})
duration location user
0 10 house A
1 5 house A
2 5 gym A
3 4 gym B
4 10 shop B
5 4 gym B
6 6 gym B
I only want to do the sum() when 'location' fields are the same across adjacent rows for a given user. So it is not just df.groupby(['id','location']).duration.sum(). The desired output will look like the following. In addition, the order is important.
duration location user
15 house A
5 gym A
4 gym B
10 shop B
10 gym B
Thank you!
Solution 1:[1]
Supply sort=False to preserve the ordering between groups like it appeared in the original DF. Then, compute the grouped sum of duration column.
adj_check = (df.location != df.location.shift()).cumsum()
df.groupby(['user', 'location', adj_check], as_index=False, sort=False)['duration'].sum()
The only change that needs to be made to what you've tried before is this condition which groups all the similar successive rows into one unique group:
(df.location != df.location.shift()).cumsum()
0 1
1 1
2 2
3 2
4 3
5 4
6 4
Name: location, dtype: int32
Solution 2:[2]
For files larger than your local settings' FILE_UPLOAD_MAX_MEMORY_SIZE, uploaded files are automatically stored as TemporaryUploadedFile objects. To get the path of obj, one must enter obj.file.path accordingly.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Byzantius |

