'Remove double quotes bounding rows in R data.table
I have several improperly formatted csvs that are tab separated, but have a double-quote bounding each row. I can read them in and ignore the " with:
library(data.table)
files = list.files(pattern="*.csv")
dt = lapply(files, fread, sep="\t", quote="")
setattr(dt, 'names', gsub(".csv", "", files))
but is there a R data.table way of handling the quotes beyond separate commands to strip first and last columns?
# sample table
DT = data.table(V1=paste0("\"", 1:5), V2=c(1,2,5,6,8),
V3=c("a\"","b\"","c\"","d\"","e\""))
dt = list(DT, DT, DT)
# these work but aren't using data.table
dt = lapply(dt, function(i) {
i[[1]] = gsub('"', '', i[[1]])
i[[ncol(i)]] = gsub('"', '', i[[ncol(i)]])
i
})
# magical mystery operation that doesn't work???
dt = lapply(dt, function(i){
i[, .SD := gsub('"', '', rep(.SD)), .SDcols=names(i)[c(1, ncol(i))]]
})
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
