'dplyr n() equivalent in Pandas?

in r dplyr I can create a column index like this:

df %>% mutate(id = 1:n())

how can I do this in Pandas? I tried these:

df['id'] = 1:len(df)

df['id'] = 1:df.iloc[-1]

the rapproach is particularly good because it works within groupings, so n() will count the length of a group_by grouping...



Solution 1:[1]

It depends what you want to do.

Assuming this input:

# R
df = data.frame(A=c(1,1,2,2,2));
# python
df = pd.DataFrame({'A': [1,1,2,2,2]})

To have a global counter:

# R
df %>% mutate(id = 1:n());
# python
df['id'] = np.arange(len(df))+1
# or
df.assign(id=np.arange(len(df))+1)

To have a counter per group:

# R
df %>% group_by(A) %>% mutate(id2 = 1:n());
# python
df['id2'] = df.groupby('A').cumcount().add(1)
# or
df.assign(id2=df.groupby('A').cumcount().add(1))

output:

   A  id  id2
0  1   1    1
1  1   2    2
2  2   3    1
3  2   4    2
4  2   5    3

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mozway