'Count the amount of times a boolean goes from True to False in a column grouped by year in another column
I have a pandas dataframe with a year column and a boolean column, and want to count how many times true changes to false by year.
| Year | Boolean |
|---|---|
| 2019 | True |
| 2019 | False |
| 2019 | True |
| 2019 | False |
| 2020 | True |
| 2020 | True |
| 2020 | False |
| 2021 | False |
| 2021 | True |
| 2021 | True |
| 2021 | False |
My expected outcome would be an array or another df that looks like this
| Year | Changes |
|---|---|
| 2019 | 2 |
| 2020 | 1 |
| 2021 | 1 |
I've used something like this to count the total number of True to False changes, but am unsure how to group them by year
(df.Col2 & (df.Col2 != df.Col2.shift(1))).sum()
Solution 1:[1]
You need to count per group, for this you can group by year, convert the booleans to integers and count the -1 values after a diff:
df.groupby('Year')['Boolean'].apply(lambda s: s.astype(int).diff().eq(-1).sum())
variant with your initial approach:
df.groupby('Year')['Boolean'].apply(lambda s: (s&s.ne(s.shift())).sum())
output:
Year
2019 2
2020 1
2021 1
Name: Boolean, dtype: int64
Solution 2:[2]
You can do:
df['Changes'] = df.groupby('Year').apply(lambda b: ~b & b.shift(1).fillna(False))
df at this point:
Year Boolean Changes
0 2019 True False
1 2019 False True
2 2019 True False
3 2019 False True
4 2020 True False
5 2020 True False
6 2020 False True
7 2021 False False
8 2021 True False
9 2021 True False
10 2021 False True
and then group and sum:
df = df.groupby('Year',as_index=False)['Changes'].sum()
Output:
Year Changes
0 2019 2
1 2020 1
2 2021 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
| Solution 2 |
