'pandas dataframe groupby apply multi columns and get count
I have a excel like this:
| year | a | b |
|---|---|---|
| 2021 | 12 | 23 |
| 2021 | 31 | 0 |
| 2021 | 15 | 21 |
| 2021 | 14 | 0 |
| 2022 | 32 | 0 |
| 2022 | 24 | 15 |
| 2022 | 28 | 29 |
| 2022 | 33 | 0 |
I wanna get count of condition: a>=30 and b==0 group by year the final output like this:
2021 1
2022 2
I wanna use pandas dataframe to implement this, can anyone help? I'm quite new to python
Solution 1:[1]
For count matched rows chain both conditions by & for bitwise AND and aggregate sum, Trues are processing like 1 and False like 0:
df1 = ((df.a>=30) & (df.b==0)).astype(int)
.groupby(df['year']).sum().reset_index(name='count')
print (df1)
year count
0 2021 1
1 2022 2
Similar idea with helper column:
df1 = (df.assign(count = ((df.a>=30) & (df.b==0)).astype(int))
.groupby('year', as_index=False)['count']
.sum())
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
