'Groupby conditional sum of adjacent rows pandas

I have a dataframe, which has been sorted by user and by time

 df = pd.DataFrame({'user' : ['A', 'A', 'A', 'B', 'B', 'B','B'],
              'location' : ['house','house','gym','gym','shop','gym','gym'], 
              'duration':[10,5,5,4,10,4,6]})


   duration location user
0        10    house    A
1         5    house    A
2         5      gym    A
3         4      gym    B
4        10     shop    B
5         4      gym    B
6         6      gym    B

I only want to do the sum() when 'location' fields are the same across adjacent rows for a given user. So it is not just df.groupby(['id','location']).duration.sum(). The desired output will look like the following. In addition, the order is important.

duration location user
      15    house    A
       5      gym    A
       4      gym    B
      10     shop    B
      10      gym    B

Thank you!



Solution 1:[1]

Supply sort=False to preserve the ordering between groups like it appeared in the original DF. Then, compute the grouped sum of duration column.

adj_check = (df.location != df.location.shift()).cumsum()
df.groupby(['user', 'location', adj_check], as_index=False, sort=False)['duration'].sum()

enter image description here


The only change that needs to be made to what you've tried before is this condition which groups all the similar successive rows into one unique group:

(df.location != df.location.shift()).cumsum()
0    1
1    1
2    2
3    2
4    3
5    4
6    4
Name: location, dtype: int32

Solution 2:[2]

For files larger than your local settings' FILE_UPLOAD_MAX_MEMORY_SIZE, uploaded files are automatically stored as TemporaryUploadedFile objects. To get the path of obj, one must enter obj.file.path accordingly.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Byzantius