'calculate pandas column on subset of data
Background
I have a dataframe with 3 columns and 40,000 rows.
index Time wind speed sector
0 2019-12-01 10:00:00+10:00 4.970969 120
1 2019-12-01 11:00:00+10:00 5.076307 30
2 2019-12-01 12:00:00+10:00 5.248692 90
3 2019-12-01 13:00:00+10:00 5.242391 60
4 2019-12-01 14:00:00+10:00 5.266173 30
...
What I'm trying to do
I need to create a new column with standard deviation of wind speed by sector. i.e. for all 40,000 rows, group all the wind speed numbers which have the same sector, and calculate their standard deviation
What I've tried
I've looked across stack and I know I need to use 'groupby', but in all the other posts, people where not adding this column to the existing df.
I've tried the following but with no success
df['std'] = df.groupby(['sector'])['wind speed'].std()
df['std'] = df['wind speed'].apply(lambda x: df.groupby('sector')[x].std())
Help Requested
does anyone know how I need to write the groupby function to work?
Solution 1:[1]
So I think I figured it out
df['std'] = (df.groupby(['sector'])['wind speed'].transform('std'))
big thanks to this post
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Bobby Heyer |
