'Pandas average every ith row of dataframe subset based on column value
I have a dataframe that looks like the following
id value
11 0
11 3
11 1
11 2
4 3
4 1
4 8
4 2
5 0
5 3
5 1
5 2
4 2
4 0
4 1
4 3
11 2
11 1
11 0
11 3
I'm hoping to average the i_th rows of the N (=4) subsets, that have id that are equal; then, end up with the following
id value
11 1
11 2
11 0.5
11 2.5
4 2.5
4 0.5
4 4.5
4 2.5
5 0
5 3
5 1
5 2
e.g. for id=11:
[0+2, 3+1, 1+0, 2+3]/2 = [1, 2, 0.5, 2.5]
Solution 1:[1]
You can create a grouper that will group each consecutive group of identical numbers, and then for each group, get a cumcount for each group. Then, group by id and that cumcount:
consecutive_id_grouper = df['id'].ne(df['id'].shift(1)).cumsum()
cumcount_grouper = df['id'].groupby(consecutive_id_grouper).cumcount()
avg = df.groupby([df['id'], cumcount_grouper], as_index=False, sort=False)['value'].mean()
Output:
>>> avg
id value
0 11 1.0
1 11 2.0
2 11 0.5
3 11 2.5
4 4 2.5
5 4 0.5
6 4 4.5
7 4 2.5
8 5 0.0
9 5 3.0
10 5 1.0
11 5 2.0
Solution 2:[2]
Another possible approach by reshaping and taking the mean. Would break if there are multiple levels of replicating groups.
(df.groupby('id', sort=False)
.apply(lambda x: x['value'].values.reshape(2,-1).T.mean(1) if len(x)==df.id.value_counts().max() else x['value'].values)
.explode()
.reset_index(name='value'))
Output
id value
0 11 1.0
1 11 2.0
2 11 0.5
3 11 2.5
4 4 2.5
5 4 0.5
6 4 4.5
7 4 2.5
8 5 0
9 5 3
10 5 1
11 5 2
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Chris |
