'Calculating mean grades for deciles within a dataset with Python, grouped by another field

import pandas as pd  
import csv



df_orig = pd.read_csv('test_sample.csv')  
df_orig = df_orig[(df_orig['number']>0)]  
decile_stats = df_orig.groupby(pd.qcut(df_orig.number, 5))['number'].mean()  


print(decile_stats)

I'm trying to use python to calculate statistics for deciles of my dataset. I can calculate the mean of each decile using qcut, but I want to group my numbers by the values in a second column. This way the deciles are calculated and reported on values according to their value in the family column.

family number
0 1000 0.04
1 1000 0.20
2 1000 0.04
3 1000 0.16
4 1000 0.08
5 1000 0.02
6 1000 0.02
7 1000 0.02
8 1000 0.64
9 1000 0.04

My desired output would be:

Q1 1000 0.028617
Q2 1000 0.105060
Q3 1000 0.452467
Q4 1000 2.644886
Q5 1000 141.749797...

etc. with each 'family' shown, 1000, 2000, 3000 in this case.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source