'Reading multiple RDS files
I have a directory with multiple RDS files (300+) that I would like to read and combine, these RDS files share the same basic format but have different number of rows & a few different columns in each file. I have the simple code to read one RDS file (All files have same "Events-3digitnumber-4digitnumber-6digitnumber.RDS")
mydata <- readRDS("Events-104-2014-752043.RDS")
Being new to data science I'm sure this simple answer that I'm missing but would I have to use something like list.files() and either lapply or some for loop.
Solution 1:[1]
Just to add a tidyverse answer:
library(tidyverse)
df <- list.files(pattern = ".RDS") %>%
map(readRDS) %>%
bind_rows()
Update:
It is advised to use map_dfr for binding rows and map_dfc for binding columns, much more efficient:
df <- list.files(pattern = ".RDS") %>%
map_dfr(readRDS)
Solution 2:[2]
Because the solution from FMM did not work for me with huge data sets, I replaced bind_rows() with data.table::rbindlist():
library(tidyverse)
library(data.table)
df <- list.files(pattern = ".rds") %>%
map(readRDS) %>%
data.table::rbindlist()
Solution 3:[3]
First a reproducible example:
data(iris)
# make sure that the two data sets (iris, iris2) have different columns
iris2 = copy(iris)
iris2$Species2 = iris2$Species
iris2$Species = NULL
saveRDS(iris, "Events-104-2014-752043.RDS")
saveRDS(iris2, "Events-104-2015-782043.RDS")
Now you need to
- find all file names
- read the data
- combine the data to one table (if you want that)
I would use data.table::rbindlist because it handles differing columns for you when you set fill = TRUE:
require(data.table)
files = list.files(path = '.', pattern = '^Events-[0-9]{3}-[0-9]{4}-[0-9]{6}\\.RDS$')
dat_list = lapply(files, function (x) data.table(readRDS(x)))
dat = rbindlist(dat_list, fill = TRUE)
Solution 4:[4]
complementing FMM's answer above, you may need to include the "full.names=TRUE" in the list.files command to allow map_dfr to read it properly, depending on the path to your files.
df <- list.files(pattern = ".RDS", full.names=T)%>%
map_dfr(readRDS)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | jwarz |
| Solution 3 | |
| Solution 4 | TomTom |
