'How do I create a loop to change the text encoding of the labels in labelled variables in R
I have imported a stata file that is giving me some encoding problems in the value labels. On import, using labelled::lookfor
for any keyword returns this error:
Error in structure(as.character(x), names = names(x)) :
invalid multibyte string at '<e9>bec Solidaire'
Knowing the data-set, that is almost certainly a value label in there.
How to I loop through the data-set fixing the encoding problem in the names of the value labels and then reset them. I have found a solution, I think, to fix the problematic characters, but I don't know how to replace the original names.
v <- labelled(c(1,2,2,2,3,9,1,3,2,NA), c(yes = 1, "Bloc Qu\xe9b\xe9cois" = 3, "don't know" = 9))
x<- labelled(c(1,2,2,2,3,9,1,3,2,NA), c("Bloc Qu\xe9b\xe9cois" = 1, no = 3, "don't know" = 9))
mydat<-data.frame(v=v, x=x)
glimpse(mydat)
mydat %>%
map(., val_labels)
#This works individually
iconv(names(val_labels(x)), from="latin1", to="UTF-8")
#And this seems to work looping over each variable, but how to I store it?
mydat %>%
map(., function(x) iconv(names(val_labels(x)), from="latin1", to="UTF-8"))
Solution 1:[1]
This seems to be a bit tough to do in one simple step, so here I used some helper functions
conv_names <- function(x) {
setNames(x, iconv(names(x), from="latin1", to="UTF-8"))
}
conv_val_labels <- function(x) {
val_labels(x) <- conv_names(val_labels(x))
x
}
mydat <- map_dfc(mydat, conv_val_labels)
But we map the function to each column and then reassign those columns back to the data frame. Note we use map_dfc
to combine the columns back into a data frame
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | MrFlick |