'How can I group elements in pandas series based on how many times they repeat?

I have this example_series:

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13     True
14     True
15     True
16     True
17     True
18     True
19     True
20     True
21    False
22    False
23    False
24    False
25    False
26    False
27     True
28    False
29    False
30    False

And i want to put the elements in groups, where a new group is created if True repeats n (lets say 5) times. I know how to group them based on where the switch happens:

grouper = example_series.diff().ne(0).cumsum()

which would give me this:

0     1
1     1
2     1
3     1
4     1
5     1
6     1
7     1
8     1
9     1
10    1
11    1
12    1
13    2
14    2
15    2
16    2
17    2
18    2
19    2
20    2
21    3
22    3
23    3
24    3
25    3
26    3
27    4
28    5
29    5
30    5

But this created a new group e.g. at index 27 which I do not want because True has not repeated 5 times. so 21-30 should all remain group 3. I have been meddling with some loops but didn't really come to anything. Is there a oneliner for something like this in pandas?



Solution 1:[1]

Not sure if there's a one-liner, but this may work if IIUC. It builds on cumsum to apply the counts to each row. If the row count is less than 5, in your example, those rows should stay with preceding group number. The bfill and ffill are needed depending on where the counts are less than 5. Note: I named the column 'value' in df1

df1 = df.diff().ne(0).cumsum()
df2 = df1.groupby(['value'])['value'].transform('count').to_frame()
df2.loc[df2.value < 5, 'value'] = np.nan
df2 = df2.value.bfill().ffill()
df_final = df2.diff().ne(0).cumsum()
df_final

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jonathan Leon