'Creating summary stats based on deciles of a column
I have a cross-sectional dataset. I have many variables including "wealth". I have a decile dummy which divides wealth into 10 groups using the following
df['decile'] = df['wealth'].transform(lambda x: pd.qcut(x, 10, labels=False))
My df (very important to point out that some variables have NaNs).
ID wealth age income ... many many variables.. decile
A 10000 30 4000 5
B 10 19 500 1
C 1000000 37 6000 9
D 2842 22 0 4
E 399932 44 NaN 8
F 2344 19 0 4
G 5000 18 0 4
H
I
..
I want to create a summary stat of variables of my choosing for the bottom decile decile=0 and the top decile decile=9, and display mean, median and std.
desired output
bottom decile top decile difference in means
mean median std mean median std
wealth .. .. .. .. .. .. .. *** (if statistically significant)
age .. .. .. .. .. .. .. **
income .. .. .. .. .. ..
..
..
..
Is there a easy way to do this using python, instead of having to calculate individually?
Solution 1:[1]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Andrea Ierardi |

