'How to use map_dfr to clean different datasets
I have written some code to clean a particular database but now I want to apply this code to other databases. I am having the following problems:
- Function reads the database as an object class "character".
- I am using an external database (UR_data) to filter data, is this the correct way to include this? UR_data contains only country names, which is used to subset the list of databases to be cleaned.
- I need to save each cleaned database as [name_of_database]_cleaned, is this the correct way to include this?
#Databases to clean
data_list <- c(Pre_primary_data, Primary_data, Lower_secondary_data, Higher_secondary_data)
dput(head(data_mod))
structure(list(Country = c("Afghanistan", "Bangladesh", "Benin",
"Bhutan", "Burkina Faso", "Cabo Verde"), `2010` = c(295.32, NA,
NA, 3747.64, 615.93, NA), `2011` = c(228.51, 438.31, NA, 4670.31,
322.81, NA), `2012` = c(165.72, 430.68, NA, NA, 317.36, NA),
`2013` = c(201.43, NA, 311.84, NA, 290.36, NA), `2014` = c(217.26,
NA, NA, NA, 315.27, 974.33), `2015` = c(185.45, NA, NA, 3216.2,
365.42, 1142.58), `2016` = c(254.33, 540.51, NA, NA, 808.66,
1255.95), `2017` = c(242.6, NA, NA, NA, NA, 1278.25), `2018` = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `2019` = c(NA,
473.69, NA, NA, NA, 679.57), `2020` = c(NA, 420.54, NA, NA,
NA, NA), `2021` = c(NA_integer_, NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_)), class = c("rowwise_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
.rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame")))
Each database consist of a column of Country names and a set of year columns indicate expenditure.
#Cleaning function
data_clean <- function(data) {
data %>%
filter(data$Country %in% UR_data$Country) %>%
arrange(data$Country) %>%
na_if(.,0) %>%
rowwise(.) %>%
filter(!all(is.na(c_across(where(is.numeric))))) %>%
data.frame()-> paste0(data, "_cleaned", sep = "")
}
#Applying map function
map_dfr(data_list, data_clean)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
