'Calculating mean grades for deciles within a dataset with Python, grouped by another field
import pandas as pd
import csv
df_orig = pd.read_csv('test_sample.csv')
df_orig = df_orig[(df_orig['number']>0)]
decile_stats = df_orig.groupby(pd.qcut(df_orig.number, 5))['number'].mean()
print(decile_stats)
I'm trying to use python to calculate statistics for deciles of my dataset. I can calculate the mean of each decile using qcut, but I want to group my numbers by the values in a second column. This way the deciles are calculated and reported on values according to their value in the family column.
family number
0 1000 0.04
1 1000 0.20
2 1000 0.04
3 1000 0.16
4 1000 0.08
5 1000 0.02
6 1000 0.02
7 1000 0.02
8 1000 0.64
9 1000 0.04
My desired output would be:
Q1 1000 0.028617
Q2 1000 0.105060
Q3 1000 0.452467
Q4 1000 2.644886
Q5 1000 141.749797...
etc. with each 'family' shown, 1000, 2000, 3000 in this case.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
