'R : How to extract the factor levels as numeric from a column and assign it to a new column using tydyverse?
Suppose I have a data frame, df
df = data.frame(name = rep(c("A", "B", "C"), each = 4))
I want to get a new data frame with one additional column named Group, in which Group element is the numeric value of the corresponding level of name, as shown in df2.
I know case_when could do it. My issue is that my real data frame is quite complicated, there are many levels of the name column. I am too lazy to list case by case.
Is there an easier and smarter way to do it?
Thanks.
df2
name Group
1 A 1
2 A 1
3 A 1
4 A 1
5 B 2
6 B 2
7 B 2
8 B 2
9 C 3
10 C 3
11 C 3
12 C 3
Solution 1:[1]
A couple other simple solutions:
library(dplyr)
df %>%
mutate(Group = match(name, unique(name)))
#> name Group
#> 1 A 1
#> 2 A 1
#> 3 A 1
#> 4 A 1
#> 5 B 2
#> 6 B 2
#> 7 B 2
#> 8 B 2
#> 9 C 3
#> 10 C 3
#> 11 C 3
#> 12 C 3
df %>%
mutate(Group = cumsum(name != lag(name, default = "")))
#> name Group
#> 1 A 1
#> 2 A 1
#> 3 A 1
#> 4 A 1
#> 5 B 2
#> 6 B 2
#> 7 B 2
#> 8 B 2
#> 9 C 3
#> 10 C 3
#> 11 C 3
#> 12 C 3
Solution 2:[2]
data.table
df = data.frame(name = rep(c("A", "B", "C"), each = 4))
library(data.table)
setDT(df)[, grp := .GRP, by = name][]
#> name grp
#> 1: A 1
#> 2: A 1
#> 3: A 1
#> 4: A 1
#> 5: B 2
#> 6: B 2
#> 7: B 2
#> 8: B 2
#> 9: C 3
#> 10: C 3
#> 11: C 3
#> 12: C 3
Created on 2022-02-10 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | caldwellst |
| Solution 2 | Yuriy Saraykin |
