'calculate pandas column on subset of data

Background

I have a dataframe with 3 columns and 40,000 rows.

index     Time                             wind speed   sector
0         2019-12-01 10:00:00+10:00        4.970969     120
1         2019-12-01 11:00:00+10:00        5.076307     30
2         2019-12-01 12:00:00+10:00        5.248692     90
3         2019-12-01 13:00:00+10:00        5.242391     60
4         2019-12-01 14:00:00+10:00        5.266173     30
...

What I'm trying to do

I need to create a new column with standard deviation of wind speed by sector. i.e. for all 40,000 rows, group all the wind speed numbers which have the same sector, and calculate their standard deviation

What I've tried

I've looked across stack and I know I need to use 'groupby', but in all the other posts, people where not adding this column to the existing df.

I've tried the following but with no success

df['std'] = df.groupby(['sector'])['wind speed'].std()
df['std'] = df['wind speed'].apply(lambda x: df.groupby('sector')[x].std())

Help Requested

does anyone know how I need to write the groupby function to work?



Solution 1:[1]

So I think I figured it out

df['std'] = (df.groupby(['sector'])['wind speed'].transform('std'))

big thanks to this post

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Bobby Heyer