'How to write multiple excel files with multiple sheets based on a variable of a split data frame in R (tidyverse)
I would like to split a data frame in order to produce multiple excel files containing multiple sheets on the basis of another variable of the original split data frame.
The data frame I used is the toy data set mtcars. I split it by cyl in order to create multiple files with sheets based on the variable gear.
So, I would expect to get three excel files with the following names:
- Cars_by_cyl_4_date×.xlsx
- Cars_by_cyl_6_date×.xlsx
- Cars_by_cyl_8_date×.xlsx
Each of them containing:
- three sheets, named: "3", "4", "5" (Cars_by_cyl_4_date×.xlsx)
- three sheets, named: "3", "4", "5" (Cars_by_cyl_6_date×.xlsx)
- two sheets, named: "3", "5" (Cars_by_cyl_8_date×.xlsx).
What I did it seems to overwrite the files.
This is what I did:
library(tidyverse)
library(writexl)
# I split the data frame
list_of_cars_by_cyl <- mtcars %>%
dplyr::group_split(cyl)
# I gave names to split data frame elements
names(list_of_cars_by_cyl) <- list_of_cars_by_cyl %>%
purrr::map(~pull(.,cyl)) %>% # individua le modalit? della variabile area
purrr::map(~as.character(.)) %>% # converte il fattore in character
purrr::map(~unique(.))
nomi <- names(list_of_cars_by_cyl)
# I create a function in order to save split data frames in .xlsx with several sheets based on a second variable
save_to_excel <- function(x) {
# list by new variable
list_of_cars_by_gear <- x %>%
dplyr::group_split(gear)
# name list's elements
names(list_of_cars_by_gear) <- list_of_cars_by_gear %>%
purrr::map(~pull(., gear)) %>% # individua le modalit? della variabile area
purrr::map(~as.character(.)) %>% # converte il fattore in character
purrr::map(~unique(.))
# save to .xlsx
list_of_cars_by_gear %>%
writexl::write_xlsx(path = paste(cartelle[5], paste0("Cars_by_cyl_", nomi, "_", format(Sys.time(), format = "%d%m%Y_%H%M%S"), ".xlsx"), sep = "/"))
}
# run the function iteratively
list_of_cars_by_cyl %>%
purrr::map(save_to_excel)
Solution 1:[1]
In your example, you are using the same file name for each list element - nomi is used identically in every call to save_to_excel. There are (at least) two ways to resolve this: construct the correct file name completely inside the save_to_excel function, or simulateneously iterate over both list_of_cars_by_cyl and nomi with purrr::map2.
Both options produce the expected output (3 Excel files containing the cyl number in the name, each with the respective sheets for the gear split).
Option 1
Get the cyl part of the file name from the dataframe passed to save_to_excel:
save_to_excel <- function(x) {
# list by new variable
list_of_cars_by_gear <- x %>%
dplyr::group_split(gear)
# name list's elements
names(list_of_cars_by_gear) <- list_of_cars_by_gear %>%
purrr::map(~pull(., gear)) %>% # individua le modalit? della variabile area
purrr::map(~as.character(.)) %>% # converte il fattore in character
purrr::map(~unique(.))
# cyl part for current file name
this_cyl <- unique(x$cyl)
# save to .xlsx
list_of_cars_by_gear %>%
writexl::write_xlsx(path = paste(
#cartelle[5], #not defined example code,
paste0("Cars_by_cyl_", this_cyl, "_", format(Sys.time(), format = "%d%m%Y_%H%M%S"), ".xlsx"),
sep = "/"))
}
list_of_cars_by_cyl %>%
purrr::map(save_to_excel)
#$`4`
#[1] "...\\Cars_by_cyl_4_25032022_220454.xlsx"
#$`6`
#[1] "...\\Cars_by_cyl_6_25032022_220454.xlsx"
#$`8`
#[1] "...\\Cars_by_cyl_8_25032022_220454.xlsx"
Option 2
Add second argument to save_to_excel and iterate over dataframes and names:
save_to_excel <- function(x, name) {
# list by new variable
list_of_cars_by_gear <- x %>%
dplyr::group_split(gear)
# name list's elements
names(list_of_cars_by_gear) <- list_of_cars_by_gear %>%
purrr::map(~pull(., gear)) %>% # individua le modalit? della variabile area
purrr::map(~as.character(.)) %>% # converte il fattore in character
purrr::map(~unique(.))
# save to .xlsx
list_of_cars_by_gear %>%
writexl::write_xlsx(path = paste(
#cartelle[5], #not defined example code,
paste0("Cars_by_cyl_", name, "_", format(Sys.time(), format = "%d%m%Y_%H%M%S"), ".xlsx"),
sep = "/"))
}
# run the function iteratively
# over dataframe list and nomi
list_of_cars_by_cyl %>%
purrr::map2(., nomi, save_to_excel)
#$`4`
#[1] "...\\Cars_by_cyl_4_25032022_221107.xlsx"
#$`6`
#[1] "...\\Cars_by_cyl_6_25032022_221107.xlsx"
#$`8`
#[1] "...\\Cars_by_cyl_8_25032022_221107.xlsx"
purrr options
As the question also asks about understanding the purrr logic, here are a few different ways how the last step could have been formulated instead. All produce the same result, and which is most easy to use/understand is imho a matter of personal preference.
# refer to first, second argument with .x, .y
purrr::map2(list_of_cars_by_cyl, nomi, ~save_to_excel(x = .x, name = .y))
# refer to first, second argument with ..1, ..2
purrr::map2(list_of_cars_by_cyl, nomi, ~save_to_excel(x = ..1, name = ..2))
# define anonymous function with appropriate number of arguments
purrr::map2(list_of_cars_by_cyl, nomi, function(x, y) {save_to_excel(x, y)})
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
