'Pandas multiple aggregations over multiple columns

With df.agg, I can apply a set of functions to all columns simultaneously:

df.agg([
   lambda: (x>0).mean(),
   lambda: (x>20).mean(),
   lambda: x.isna().sum(),
])

Because the functions are anonymous, the results name will simply say lambda which isn't helpful.

Of course, providing named functions resolves this:

def gt_0(x):
    return (x>0).mean()

def gt_20(x):
    return (x>20).mean()

def n_na(x):
    return x.isna().sum()
    
df.agg([gt_0, gt_20, n_na], axis=0)

Is there a more concise way? Typically those functions are not complex at all and will not be reused.



Solution 1:[1]

Instead of agg, you could construct a dictionary and construct a DataFrame from it:

out = pd.DataFrame.from_dict({'gt_0': (df>0).mean(), 
                               'gt_20':(df>20).mean(), 
                               'n_na': df.isna().sum()}, 
                             orient='index')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1