'Faster data reformatting with groupby

So I have a DataFrame that looks something along these lines:

import pandas as pd

ddd = {
    'a': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'b': [22, 25, 18, 53, 19, 8, 75, 11, 49, 64],
    'c': [1, 1, 1, 2, 2, 3, 4, 4, 4, 5]
}
df = pd.DataFrame(ddd)

What I need is to group the data by the 'c' column and apply some data transformations. At the moment I'm doing this:

def do_stuff(d: pd.DataFrame):
    if d.shape[0] >= 2:
        return pd.DataFrame(
            {
                'start': [d.a.values[0]],
                'end': [d.a.values[d.shape[0] - 1]],
                'foo': [d.a.sum()],
                'bar': [d.b.mean()]
            }
        )
    else:
        return pd.DataFrame()

r = df.groupby('c').apply(lambda x: do_stuff(x))

Which gives the correct result:

     start  end   foo        bar
c                               
1 0    1.0  3.0   6.0  21.666667
2 0    4.0  5.0   9.0  36.000000
4 0    7.0  9.0  24.0  45.000000

The problem is that this approach appears to be too slow. On my actual data it runs in around 0.7 seconds which is too long and needs to be ideally much faster.

Is there any way I can do this faster? Or maybe there's some other faster method not involving groupby that I could use?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source