'R - Replace items in a list based on another vector
I am doing a fuzzy name matching exercise and am trying to reduce the number of spelling variations of the same name using tidystringdist. I end up with a dataframe of matches containing two vectors. One has the original value and the second has the value it needs to be changed into. So I need to go back to the original vector of names and change them based on the df with the match values. Normal this would be easy, left_join() on the original names and done. But, my original names can have anywhere from 1 to 4 values in it (multiple owners on properties) so the values to be changed are actually a list of lists. Here is a reprex of what I have done so far:
library(dplyr)
data_to_change <- data.frame(house_number = c(1,2,3),
animal = rbind(c("dog|cat|monkey"),
c("goldfish"),
c("mouse|dog|rabbit|squirrel"))) %>%
mutate(animal_split = strsplit(animal, "[|]"))
new_names <- data.frame(cbind(V1 = c("dog", "rabbit"),
V2 = c("doggy", "bunny")))
The original data looks like this:
[[1]]
[1] "dog" "cat" "monkey"
[[2]]
[1] "goldfish"
[[3]]
[1] "mouse" "dog" "rabbit" "squirrel"
And I would like to change the animal names so the result looks like this:
[[1]]
[1] "doggy" "cat" "monkey"
[[2]]
[1] "goldfish"
[[3]]
[1] "mouse" "doggy" "bunny" "squirrel"
I don't believe I can simply use replace, because the target and match df list are of different lengths. And I don't think I can unlist it and change it because I need to preserve the association with the house number and other animals in the house.
Solution 1:[1]
You can use a lapply() to wrap around your list, and use stringi::stri_replace_all_fixed() to replace the text.
library(stringi)
data_to_change$animal_split <- lapply(data_to_change$animal_split, stri_replace_all_fixed, new_names$V1, new_names$V2, vectorize = F)
data_to_change$animal_split
[[1]]
[1] "doggy" "cat" "monkey"
[[2]]
[1] "goldfish"
[[3]]
[1] "mouse" "doggy" "bunny" "squirrel"
Solution 2:[2]
As these are fixed matches, we can use deframe to convert the data.frame into a named vector and then use that to match and replace the vector elements in the list by looping over (map) and finally coalesce with the original vector so that the NAs are replaced by original vector
library(dplyr)
library(tibble)
library(purrr)
data_to_change %>%
mutate(animal_split = map(animal_split,
~ coalesce(deframe(new_names)[.x], .x)))
-output
house_number animal animal_split
1 1 dog|cat|monkey doggy, cat, monkey
2 2 goldfish goldfish
3 3 mouse|dog|rabbit|squirrel mouse, doggy, bunny, squirrel
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | akrun |
