'How to subset dataframe using list that includes partial strings of another variable
I have a dataset with a variable, let's call it a, that shows country pairs. I would like to create a subset based on whether an EU country is one of the countries in variable a. I would like to do this using a list, so that R can just go through variable a and keep those that match.
df <- data.frame(a = c('Albania Canada', 'Croatia USA', 'Mexico Egypt', 'Switzerland Hungary', 'Lithuania Indonesia'),
b = c(1, 2, 3, 4, 5))
EU <- c("Austria", "Belgium", "Bulgaria", "Croatia", "Czech Republic", "Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary", "Ireland", "Italy", "Lativa", "Lithuania", "Luxembourg", "Malta", "Netherlands", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia", "Spain", "Sweden")
I have seen that subsetting works using:
mySpecies <-c("versicolor","virginica" )
iris[iris$Species %in% mySpecies,]
However, this needs a complete match, whereas I guess in my case it would need to match with a partial string. Is there anything with grepl maybe? I am R novice so would appreciate some help!
Solution 1:[1]
You were on the right track, grepl is your friend. So that you can use the countries with it, paste them together while collapsing on an or |.
Then, using subset
EU_p <- paste(EU, collapse='|')
subset(df, grepl(EU_p, a))
# a b
# 2 Croatia USA 2
# 4 Switzerland Hungary 4
# 5 Lithuania Indonesia 5
or as you indicated using brackets
df[grepl(EU_p, df$a), ]
# a b
# 2 Croatia USA 2
# 4 Switzerland Hungary 4
# 5 Lithuania Indonesia 5
The result is any row of df containing at least one country of the EU vector, since the pattern as is doesn't distinguish the position.
Data:
df <- structure(list(a = c("Albania Canada", "Croatia USA", "Mexico Egypt",
"Switzerland Hungary", "Lithuania Indonesia"), b = c(1, 2, 3,
4, 5)), class = "data.frame", row.names = c(NA, -5L))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
