'Rename & Reduce multiple similar observations in R
I have a categorical variable with 169 levels. I want to reduce to manageable 7-10 factors, - "Religion", "Culture & Art", "Education", "Animal Protection", "Emergency", "Environment Protection", "Social Service" etc
I understand that, I can use levels() function to rename all these 169 factors, however I am looking for smart options, such as can i use "Religion" or "Culture" as a filter to group all of them under 1 code?
Solution 1:[1]
You could do something like this. See the documentation on str_detect for additional options.
It's easier for someone to help you if you can supply some useable minimal example data and attempted code per the reproducible example below. Then we can run it and suggest improvements.
library(tidyverse)
data_df <- tribble(
~ label,
"Culture and Arts",
"Education in Japanese",
"Culture and Recreation",
"culture & Environment",
"Environmental Activities",
"Education & research"
)
data_df2 <- data_df |>
mutate(category = case_when(
str_detect(label, "Cultu") ~ "Culture & Arts",
str_detect(label, "Educ") ~ "Education",
str_detect(label, "Environ") ~ "Environment",
TRUE ~ "Other"
) |> factor())
data_df2
#> # A tibble: 6 × 2
#> label category
#> <chr> <fct>
#> 1 Culture and Arts Culture & Arts
#> 2 Education in Japanese Education
#> 3 Culture and Recreation Culture & Arts
#> 4 culture & Environment Environment
#> 5 Environmental Activities Environment
#> 6 Education & research Education
levels(data_df2$category)
#> [1] "Culture & Arts" "Education" "Environment"
Created on 2022-04-23 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Carl |

