'R : How to extract the factor levels as numeric from a column and assign it to a new column using tydyverse?

Suppose I have a data frame, df

df = data.frame(name = rep(c("A", "B", "C"), each = 4))

I want to get a new data frame with one additional column named Group, in which Group element is the numeric value of the corresponding level of name, as shown in df2.

I know case_when could do it. My issue is that my real data frame is quite complicated, there are many levels of the name column. I am too lazy to list case by case.

Is there an easier and smarter way to do it?

Thanks.

df2
   name Group
1     A     1
2     A     1
3     A     1
4     A     1
5     B     2
6     B     2
7     B     2
8     B     2
9     C     3
10    C     3
11    C     3
12    C     3


Solution 1:[1]

A couple other simple solutions:

library(dplyr)

df %>%
  mutate(Group = match(name, unique(name)))
#>    name Group
#> 1     A     1
#> 2     A     1
#> 3     A     1
#> 4     A     1
#> 5     B     2
#> 6     B     2
#> 7     B     2
#> 8     B     2
#> 9     C     3
#> 10    C     3
#> 11    C     3
#> 12    C     3

df %>%
  mutate(Group = cumsum(name != lag(name, default = "")))
#>    name Group
#> 1     A     1
#> 2     A     1
#> 3     A     1
#> 4     A     1
#> 5     B     2
#> 6     B     2
#> 7     B     2
#> 8     B     2
#> 9     C     3
#> 10    C     3
#> 11    C     3
#> 12    C     3

Solution 2:[2]

data.table

df = data.frame(name = rep(c("A", "B", "C"), each = 4))

library(data.table)
setDT(df)[, grp := .GRP, by = name][]
#>     name grp
#>  1:    A   1
#>  2:    A   1
#>  3:    A   1
#>  4:    A   1
#>  5:    B   2
#>  6:    B   2
#>  7:    B   2
#>  8:    B   2
#>  9:    C   3
#> 10:    C   3
#> 11:    C   3
#> 12:    C   3

Created on 2022-02-10 by the reprex package (v2.0.1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 caldwellst
Solution 2 Yuriy Saraykin