'dplyr n() equivalent in Pandas?
in r dplyr I can create a column index like this:
df %>% mutate(id = 1:n())
how can I do this in Pandas? I tried these:
df['id'] = 1:len(df)
df['id'] = 1:df.iloc[-1]
the rapproach is particularly good because it works within groupings, so n() will count the length of a group_by grouping...
Solution 1:[1]
It depends what you want to do.
Assuming this input:
# R
df = data.frame(A=c(1,1,2,2,2));
# python
df = pd.DataFrame({'A': [1,1,2,2,2]})
To have a global counter:
# R
df %>% mutate(id = 1:n());
# python
df['id'] = np.arange(len(df))+1
# or
df.assign(id=np.arange(len(df))+1)
To have a counter per group:
# R
df %>% group_by(A) %>% mutate(id2 = 1:n());
# python
df['id2'] = df.groupby('A').cumcount().add(1)
# or
df.assign(id2=df.groupby('A').cumcount().add(1))
output:
A id id2
0 1 1 1
1 1 2 2
2 2 3 1
3 2 4 2
4 2 5 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
