'How to subset dataframe using list that includes partial strings of another variable

I have a dataset with a variable, let's call it a, that shows country pairs. I would like to create a subset based on whether an EU country is one of the countries in variable a. I would like to do this using a list, so that R can just go through variable a and keep those that match.

df <- data.frame(a = c('Albania Canada', 'Croatia USA', 'Mexico Egypt', 'Switzerland Hungary', 'Lithuania Indonesia'), 
                 b = c(1, 2, 3, 4, 5))
EU <- c("Austria", "Belgium", "Bulgaria", "Croatia", "Czech Republic", "Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary", "Ireland", "Italy", "Lativa", "Lithuania", "Luxembourg", "Malta", "Netherlands", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia", "Spain", "Sweden")

I have seen that subsetting works using:

mySpecies <-c("versicolor","virginica" )
iris[iris$Species %in% mySpecies,]

However, this needs a complete match, whereas I guess in my case it would need to match with a partial string. Is there anything with grepl maybe? I am R novice so would appreciate some help!



Solution 1:[1]

You were on the right track, grepl is your friend. So that you can use the countries with it, paste them together while collapsing on an or |.

Then, using subset

EU_p <- paste(EU, collapse='|')

subset(df, grepl(EU_p, a))
#                     a b
# 2         Croatia USA 2
# 4 Switzerland Hungary 4
# 5 Lithuania Indonesia 5

or as you indicated using brackets

df[grepl(EU_p, df$a), ]
#                     a b
# 2         Croatia USA 2
# 4 Switzerland Hungary 4
# 5 Lithuania Indonesia 5

The result is any row of df containing at least one country of the EU vector, since the pattern as is doesn't distinguish the position.


Data:

df <- structure(list(a = c("Albania Canada", "Croatia USA", "Mexico Egypt", 
"Switzerland Hungary", "Lithuania Indonesia"), b = c(1, 2, 3, 
4, 5)), class = "data.frame", row.names = c(NA, -5L))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1