'Pandas multiple aggregations over multiple columns
With df.agg, I can apply a set of functions to all columns simultaneously:
df.agg([
lambda: (x>0).mean(),
lambda: (x>20).mean(),
lambda: x.isna().sum(),
])
Because the functions are anonymous, the results name will simply say lambda which isn't helpful.
Of course, providing named functions resolves this:
def gt_0(x):
return (x>0).mean()
def gt_20(x):
return (x>20).mean()
def n_na(x):
return x.isna().sum()
df.agg([gt_0, gt_20, n_na], axis=0)
Is there a more concise way? Typically those functions are not complex at all and will not be reused.
Solution 1:[1]
Instead of agg, you could construct a dictionary and construct a DataFrame from it:
out = pd.DataFrame.from_dict({'gt_0': (df>0).mean(),
'gt_20':(df>20).mean(),
'n_na': df.isna().sum()},
orient='index')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
