'How can I read multiple csv files into R at once and know which file the data is from?
I want to read multiple csv files into R and combine them into one large table. I however need to a column that identifies which file each row came from.
Basically, every row has a unique identifying number within a file but those numbers are repeated across files. So if I bind all files into a table without knowing which file every row is from I won't have a unique identifier anymore which makes my planned analysis impossible.
What I have so far is this but this doesn't give me what file the data came from.
list_file <- list.files(pattern="*.csv") %>% lapply(read.csv,stringsAsFactors=F)
combo_data <- list.rbind(list_file)
I have about 100 files to read in so I'd really appreciate any help so I don't have to do them all individually.
Solution 1:[1]
In case you want to use base R
you can use
file.names <- list.files(pattern = "*.csv")
df.list <- lapply(file.names, function(file.name)
{
df <- read.csv(file.name)
df$file.name <- file.name
return(df)
})
df <- list.rbind(df.list)
Solution 2:[2]
One way would be to use map_df
from purrr
to bind all the csv's into one with a unique column identifier.
filenames <- list.files(pattern="*.csv")
purrr::map_df(filenames, read.csv,stringsAsFactors = FALSE, .id = 'filename') %>%
dplyr::mutate(filename = filenames[filename]) -> combo_data
Also :
combo_data <- purrr::map_df(filenames,
~read.csv(.x, stringsAsFactors = FALSE) %>% mutate(filename = .x))
In base R :
combo_data <- do.call(rbind, lapply(filenames, function(x)
cbind(read.csv(x, stringsAsFactors = FALSE), filename = x)))
Solution 3:[3]
As other answer suggested, now tidyverse made things easier:
library(readr)
library(purrr)
library(dplyr)
library(stringr)
df <- fs::dir_ls(regexp = "\\.csv$") %>%
map_dfr(read_csv, id='path') %>%
mutate(thename = str_replace(path, ".tsv","")) %>%
select(-path)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | MacOS |
Solution 2 | Ronak Shah |
Solution 3 | Andrés Parada |