'Advanced Lookup in R: How can I look through strings and add a value from a dataframe?

Neither Excel vlookup function nor R join functions do help. I am attempting to look up for a specific string from one dataframe and add new columns based on the match from different dataframe. But as far as I see the _join functions don't do the justice for my particular problem. I present two dataframes and my code here below:

**id**        **address**
3811          bb
4803          dd
4820          dd
852           aa
4031          dd

I want to look through this address variable and match from local variable in another dataframe below. Then I want to add values from a column district.

**local**             **district**
aa                    AA
bb                    BB
cc                    CC
dd                    DD

I ran this code to complete the task. It performs well when I ran without for loop, I guess. However, with for loop it produces an error.

distr <- data.frame(1:7000)

for (word in df2$local) {
    ind = stringi::stri_detect_fixed(train$address, word) %>% which(.==T)
    ind2 = stringi::stri_detect_fixed(df2$local, word) %>% which(.==T)
    distr[ind, 2] <- df2[ind2, 3]
  }

The code is designed this way so I could add the column of dataframe distr to train dataframe later on. Where am I making specific errors to run code this properly? Anyone with string expertise?

P.S. By the way, I chose stri_detect_fixed function because regex expressions couldn't work for each values here.



Solution 1:[1]

As I_O sugggests, this seems to work fine with fuzzyjoin::fuzzy_join():

library(fuzzyjoin)
fuzzy_join(d1, d2, match_fun = stringi::stri_detect_fixed,
           by = c("address" = "local"))

gives

    id              address      local     district
1 3811 Yntymak,???????/????    Yntymak    leninskyi
2 4803           JD station JD station pervomayskyi
3 4820 JD station, Panfilov JD station pervomayskyi
4  852              Ak-Bata    Ak-Bata sverdlovskyi
5 4031           JD station JD station pervomayskyi

d1 <- read.table(header = TRUE,
                 sep = ";",
                 text = "
id;address
3811;Yntymak,???????/????
4803;JD station
4820;JD station, Panfilov
852;Ak-Bata
4031;JD station
")

d2 <- read.table(header = TRUE,
                 sep = ";",
                 text = "
local;district
Ak-Bata;sverdlovskyi
Yntymak;leninskyi
Zhilgorodok Sovmina;oktyabrskyi
JD station;pervomayskyi
")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ben Bolker