'pandas/python: creating a numerical categorical variable that counts the categories
I am trying to build a column in a pandas DF that is counting the category CHANGES of a categorical variable in a "rolling" way. What I keep on finding in stackoverflow is a number of rolling counts, which is exactly the opposite of what I am looking for. I am looking for a column that runs through an alphabetically sorted categorical column and adds an increment of 1 every time the category changes but gets dragged unchanged otherwise. So if I have the variable named 'cat_var' in the example below, I need to programmatically create the column 'category_counter_var' which I manually created in the example below. Can someone help?
import pandas as pd
df = pd.DataFrame({'cat_var':['Q1','Q1','Q1','Q2','Q2','Q3','Q4','Q4','Q4','Q4']
,'category_counter_var':[1,1,1,2,2,3,4,4,4,4]})
Solution 1:[1]
Use:
df['new'] = df['cat_var'].ne(df['cat_var'].shift()).cumsum()
print(df)
# Output
cat_var category_counter_var new
0 Q1 1 1
1 Q1 1 1
2 Q1 1 1
3 Q2 2 2
4 Q2 2 2
5 Q3 3 3
6 Q4 4 4
7 Q4 4 4
8 Q4 4 4
9 Q4 4 4
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Corralien |

