'Pandas count based on condition in current row from records before current row

I have a dataframe show as follows:

import pandas as pd
df=pd.DataFrame({'col1':['a','a','a','b','b','c']})
df.sort_values('col1', inplace=True)
df['Ref']=0

Thus the dataframe looks like:

a  0
a  0
a  0
b  0
b  0
c  0

For the ref column, I want to show the number of reference of current row. For illustration purpose, following is what I want to achieve:

a  0
a  1
a  2
b  0
b  1
c  0

I can use df.iterrows() and loop row by row. Un fortunately in my case, it will take 15 minutes to run. I am wondering if there is a reasonable way to do so.



Solution 1:[1]

Group the data by col1 and use cumcount

import pandas as pd 

df = pd.DataFrame({'col1':['a','a','a','b','b','c']})

df['Ref'] = df.groupby('col1').cumcount()
    
df.sort_values('col1', inplace=True)

Output:

>>> df

  col1  Ref
0    a    0
1    a    1
2    a    2
3    b    0
4    b    1
5    c    0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rodalm