'Find how many consecutive occasions of a column in pandas dataframe with groupby
I have a pandas dataframe that looks like this
import pandas as pd
foo = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2,2],
'col_a': [1,1,0,1,0,1,1,1,0,1],
'col_b': [0,1,1,0,0,0,1,1,1,0]})
I would like to create 2 columns:
- col_a_consequent:
1ifcol_ahasnconsequent occurrences of1s byid - col_c:
1if after2consequent occurrences of1s atcol_athere is1atcol_b
The output dataframe looks like this:
for n=3
foo = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2,2],
'col_a': [1,1,0,1,0,1,1,1,0,1],
'col_b': [0,1,1,0,0,0,1,1,1,0],
'col_a_consequent': [0,0,0,0,0,1,1,1,0,0],
'col_c': [1,1,1,0,0,1,1,1,1,0]})
For col_a_consequent according to this question I can obtain what I want
n = 3
foo_tmp = foo.query('id == 2')
(foo_tmp.col_a.groupby((foo_tmp.col_a != foo_tmp.col_a.shift()).\
cumsum()).transform('size') * foo_tmp.col_a >= n).astype(int)
but I dont know how I can do the same operation with groupby for all ids
Any ideas ?
Solution 1:[1]
For the first you could use:
N = 3
foo['col_a_consequent'] = (foo
.groupby(['id', foo['col_a'].ne(foo['col_a'].shift()).cumsum()])
['col_a'].transform('size').ge(N).astype(int)
)
output:
id col_a col_b col_a_consequent2 col_a_consequent3
0 1 1 0 1 0
1 1 1 1 1 0
2 1 0 1 0 0
3 1 1 0 0 0
4 1 0 0 0 0
5 2 1 0 1 1
6 2 1 1 1 1
7 2 1 1 1 1
8 2 0 1 0 0
9 2 1 0 0 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
