'Pandas count based on condition in current row from records before current row
I have a dataframe show as follows:
import pandas as pd
df=pd.DataFrame({'col1':['a','a','a','b','b','c']})
df.sort_values('col1', inplace=True)
df['Ref']=0
Thus the dataframe looks like:
a 0
a 0
a 0
b 0
b 0
c 0
For the ref column, I want to show the number of reference of current row. For illustration purpose, following is what I want to achieve:
a 0
a 1
a 2
b 0
b 1
c 0
I can use df.iterrows() and loop row by row. Un fortunately in my case, it will take 15 minutes to run. I am wondering if there is a reasonable way to do so.
Solution 1:[1]
Group the data by col1 and use cumcount
import pandas as pd
df = pd.DataFrame({'col1':['a','a','a','b','b','c']})
df['Ref'] = df.groupby('col1').cumcount()
df.sort_values('col1', inplace=True)
Output:
>>> df
col1 Ref
0 a 0
1 a 1
2 a 2
3 b 0
4 b 1
5 c 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rodalm |
