'Adding a column, of which the values depend on whether the value in another column, matches one of four vectors
I have data as follows:
library(stringi)
datfake <- as.data.frame(runif(100, 0, 3000))
names(datfake)[1] <- "Inc"
datfake$type <- sample(LETTERS, 100, replace = TRUE)
datfake$province <- stri_rand_strings(100, 1, "[A-P]")
region_south <- c("A", "B", "C", "D")
region_north <- c("E", "F", "G", "H", "I")
region_east <- c("J", "K", "L")
region_west <- c("M", "N", "O", "P")
EDIT:
In my actual data the regions are as follows:
region_north <- c("Drenthe", "Friesland", "Groningen")
region_east <- c("Flevoland", "Gelderland", "Overijssel")
region_west <- c("Zeeland", "Noord-Holland", "Utrecht", "Zuid-Holland")
region_south <- c("Limburg", "Noord-Brabant")
I would like to add a column that tells me in which reason each province is. All the solutions I come up with are a bit clunky (for example turning the vector region_south
into a two column dataframe, where the second column says south
and then merging). What would be the easiest way to do this?
Desired output:
Inc type province region
1 297.7387 C J east
2 2429.0961 E D south
Solution 1:[1]
An idea is to use mget
to get the regions, unlist and take advantage of the named vector object and match the values with province and return the names, i.e.
v1 <- unlist(mget(ls(.GlobalEnv, pattern = 'region_')))
res <- names(v1)[match(datfake$province, v1)]
gsub('region_(.+)[0-9]+','\\1' ,res)
[1] "north" "east" "north" "north" "south" "south" "south" "west" "west" "east" "south" "south" "west" "north" "north" "south" "east" "north" "south" "east" "north" "west"
[23] "south" "west" "north" "west" "east" "north" "east" "south" "south" "east" "south" "west" "north" "east" "west" "south" "south" "east" "north" "west" "west" "south"
[45] "north" "east" "south" "west" "north" "south" "east" "west" "north" "north" "north" "south" "north" "south" "north" "north" "west" "north" "north" "south" "west" "north"
[67] "east" "south" "north" "west" "south" "west" "north" "north" "north" "south" "north" "east" "west" "south" "west" "north" "west" "east" "north" "west" "south" "east"
[89] "north" "west" "north" "north" "west" "south" "west" "north" "west" "west" "south" "west"
Solution 2:[2]
We can use case_when
along with grepl
here:
library(dplyr)
df$region <- case_when(
grepl(paste0("^[", paste(region_north, collapse=""), "]$"), df$province) ~ "north",
grepl(paste0("^[", paste(region_south, collapse=""), "]$"), df$province) ~ "south",
grepl(paste0("^[", paste(region_east, collapse=""), "]$"), df$province) ~ "east",
grepl(paste0("^[", paste(region_west, collapse=""), "]$"), df$province) ~ "west"
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Tom |