'Group by based on repeating column values of panda df to sum/average other columns

I have a df looking like this:

    Activity    Count   angle_1frame_abs
87  11.2454 4   1.9863239600400613
88  14.3124 4   1.633204419481332
89  7.15621 4   1.7235925045363631
90  1.02232 4   1.4205234792290875
172 6.13389 1   1.9096280055821166
187 1.02232 3   1.7052938965382456
188 7.15621 3   1.708833899378485
189 2.04463 3   1.2728507985832682
233 4.08926 1   1.554572584797844
265 4.08926 2   1.512615236089327
266 5.11157 2   1.4850900583919704
281 6.13389 1   1.162132169753371
305 3.06694 2   2.3605660470439824
306 3.06694 2   1.5685525010916657
385 5.11157 2   1.6579646804948973
386 2.04463 2   2.121520877298791
407 5.11157 4   1.1528498264361269
408 12.2678 4   1.7986876725933032
409 9.20082 4   1.5502484587771188
410 2.04463 4   1.6302871732665316

The column "Count" has stretches of repeating values that I would like to use to group the df.

In the above example I would have 8 groups.

I struggle keeping repeating values as separate groups.

My final aim would be to either average (Activity) or sum (angle_1frame_abs) the other columns based on these groups.

My desired output would be something like this (note: the activity and angle_1frames_abs values are made up):

Activity Count angle_1_frames_abs
9   4   1.7
6   1   1.9 
4   3   1.7
4   1   1.5
4   2   1.4
6   1   1.1
4   2   1.5
8   4   1.5

I have been trying something like this but this only gave me groups based on unique values in the "count" column.

df.groupby(["Count"]).angle_1frame_abs.sum().reset_index()

Solution 1:^[1]

You could define every column you want to keep on your aggregation function:

df.groupby(["Count"]).agg({k: ['sum', 'mean'] for k in df.columns}).reset_index()

This here code calculates the sum and average for every column per group, hence this will work only with a dataframe that has only numeric columns.
Here is a version that will work for streches of groups:

# firstly, create a helper column called "groups" (this col will signify
# when a value is changed in the Count column):
df['groups'] = None
group = 0
for i, j in df.iterrows():
    if i == 0:
        df.loc[i, 'groups'] = group
        valueBefore = df.loc[i, 'Count']
        continue
    if j.loc['Count'] != valueBefore:
        group += 1
    df.loc[i, 'groups'] = group
    valueBefore = df.loc[i, 'Count']
# then you can use it to groupby all other columns, i.e.:
df.groupby(["groups"]).agg({k: ['sum', 'mean'] for k in df.columns}).reset_index()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Group by based on repeating column values of panda df to sum/average other columns

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]