'Pandas groupby count values above threshold
I have a groupby question that I can't solve. It is probably simple, but I can't get it to work nicely. I am trying to compute some statistics on a variable with pandas groupby chained with the very handy agg function. I would like add to the list below a calculation of the number of values above a given threshold.
df = df.groupby(['scenario','Name','year','month'])["Value"].agg([np.min,np.max,np.mean,np.std])
Usually, I compute the number of values above a given threshold as shown below, but I can't find a way to add this to the aggregation function. Do you know how I could do that?
df =df[df>0].groupby(['scenario','Name','year','month']).count()
Solution 1:[1]
Your answer works. Else you could add it to the one line, not needing to create a separate function by using lambda x: instead.
df = df.groupby(["scenario", "Name", "year", "month"])["Value"].agg([np.min, np.max, np.mean, np.std, lambda x: ((x > 0)*1).sum()])
The logic here: (x > 0) returns True/False bool; *1 turns the bool to an integer (1 = True, 0 = False); .sum() will sum all the 1s and 0s within the group - and as those that are True = 1, the sum will count all values greater than 0.
Running a quick test on the time taken, your solution is faster, but I thought I would give an alternative solution anyway.
Solution 2:[2]
I found a solution by creating a function and passing it in the agg function.
def counta(x):
m = np.count_nonzero(x > 10)
return m
df = df.groupby(['scenario','Name','year','month'])["Value"].agg([np.min,np.max,np.mean,np.std,counta])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rawson |
| Solution 2 | Peslier53 |
