'Faster data reformatting with groupby
So I have a DataFrame that looks something along these lines:
import pandas as pd
ddd = {
'a': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'b': [22, 25, 18, 53, 19, 8, 75, 11, 49, 64],
'c': [1, 1, 1, 2, 2, 3, 4, 4, 4, 5]
}
df = pd.DataFrame(ddd)
What I need is to group the data by the 'c' column and apply some data transformations. At the moment I'm doing this:
def do_stuff(d: pd.DataFrame):
if d.shape[0] >= 2:
return pd.DataFrame(
{
'start': [d.a.values[0]],
'end': [d.a.values[d.shape[0] - 1]],
'foo': [d.a.sum()],
'bar': [d.b.mean()]
}
)
else:
return pd.DataFrame()
r = df.groupby('c').apply(lambda x: do_stuff(x))
Which gives the correct result:
start end foo bar
c
1 0 1.0 3.0 6.0 21.666667
2 0 4.0 5.0 9.0 36.000000
4 0 7.0 9.0 24.0 45.000000
The problem is that this approach appears to be too slow. On my actual data it runs in around 0.7 seconds which is too long and needs to be ideally much faster.
Is there any way I can do this faster? Or maybe there's some other faster method not involving groupby that I could use?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
