'Pandas groupby count values above threshold

I have a groupby question that I can't solve. It is probably simple, but I can't get it to work nicely. I am trying to compute some statistics on a variable with pandas groupby chained with the very handy agg function. I would like add to the list below a calculation of the number of values above a given threshold.

df = df.groupby(['scenario','Name','year','month'])["Value"].agg([np.min,np.max,np.mean,np.std])

Usually, I compute the number of values above a given threshold as shown below, but I can't find a way to add this to the aggregation function. Do you know how I could do that?

df =df[df>0].groupby(['scenario','Name','year','month']).count()

python pandas

Solution 1:^[1]

Your answer works. Else you could add it to the one line, not needing to create a separate function by using lambda x: instead.

df = df.groupby(["scenario", "Name", "year", "month"])["Value"].agg([np.min, np.max, np.mean, np.std, lambda x: ((x > 0)*1).sum()])

The logic here: (x > 0) returns True/False bool; *1 turns the bool to an integer (1 = True, 0 = False); .sum() will sum all the 1s and 0s within the group - and as those that are True = 1, the sum will count all values greater than 0.

Running a quick test on the time taken, your solution is faster, but I thought I would give an alternative solution anyway.

Solution 2:^[2]

I found a solution by creating a function and passing it in the agg function.

def counta(x):
    m = np.count_nonzero(x > 10)
    return m 

df = df.groupby(['scenario','Name','year','month'])["Value"].agg([np.min,np.max,np.mean,np.std,counta])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Rawson
Solution 2	Peslier53

'Pandas groupby count values above threshold

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]