'Group by based on repeating column values of panda df to sum/average other columns
I have a df looking like this:
Activity Count angle_1frame_abs
87 11.2454 4 1.9863239600400613
88 14.3124 4 1.633204419481332
89 7.15621 4 1.7235925045363631
90 1.02232 4 1.4205234792290875
172 6.13389 1 1.9096280055821166
187 1.02232 3 1.7052938965382456
188 7.15621 3 1.708833899378485
189 2.04463 3 1.2728507985832682
233 4.08926 1 1.554572584797844
265 4.08926 2 1.512615236089327
266 5.11157 2 1.4850900583919704
281 6.13389 1 1.162132169753371
305 3.06694 2 2.3605660470439824
306 3.06694 2 1.5685525010916657
385 5.11157 2 1.6579646804948973
386 2.04463 2 2.121520877298791
407 5.11157 4 1.1528498264361269
408 12.2678 4 1.7986876725933032
409 9.20082 4 1.5502484587771188
410 2.04463 4 1.6302871732665316
The column "Count" has stretches of repeating values that I would like to use to group the df.
In the above example I would have 8 groups.
I struggle keeping repeating values as separate groups.
My final aim would be to either average (Activity) or sum (angle_1frame_abs) the other columns based on these groups.
My desired output would be something like this (note: the activity and angle_1frames_abs values are made up):
Activity Count angle_1_frames_abs
9 4 1.7
6 1 1.9
4 3 1.7
4 1 1.5
4 2 1.4
6 1 1.1
4 2 1.5
8 4 1.5
I have been trying something like this but this only gave me groups based on unique values in the "count" column.
df.groupby(["Count"]).angle_1frame_abs.sum().reset_index()
Solution 1:[1]
You could define every column you want to keep on your aggregation function:
df.groupby(["Count"]).agg({k: ['sum', 'mean'] for k in df.columns}).reset_index()
This here code calculates the sum and average for every column per group, hence this will work only with a dataframe that has only numeric columns.
Here is a version that will work for streches of groups:
# firstly, create a helper column called "groups" (this col will signify
# when a value is changed in the Count column):
df['groups'] = None
group = 0
for i, j in df.iterrows():
if i == 0:
df.loc[i, 'groups'] = group
valueBefore = df.loc[i, 'Count']
continue
if j.loc['Count'] != valueBefore:
group += 1
df.loc[i, 'groups'] = group
valueBefore = df.loc[i, 'Count']
# then you can use it to groupby all other columns, i.e.:
df.groupby(["groups"]).agg({k: ['sum', 'mean'] for k in df.columns}).reset_index()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
