'Find standard deviation of a column based of values from another column and group by

I have a data frame looking like this:

classid  grade  haveTeacher
0        99     1
1        40     1
1        50     0
1        70     1
2        50     0
3        34     0

I'd like to find out what I could write in pandas to find out the standard deviation of "grade" across classid that have a teacher (1 means there is a teacher). I know we would have to groupby "classid", but I was wondering what would go inside the .apply and lambda function to fulfill all these conditionals?



Solution 1:[1]

You might first want to get the dataframe with records having teacher - df[df['haveteacher'] == 1]. Once you get this you can do a groupby(classid) and use numpy.std (import numpy as np before that ) function to find the standard devitation of that group so you have -

>>> df[df['haveteacher'] == 1].groupby(['classid']).agg({'grade': np.std})

output is -

grade
classid           
0              NaN
1        21.213203

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rajarshi Ghosh