'In r - Is there a way of adding a name column ('ID') to tibble with multiple imported .csv files from url?
I am trying to write a function which downloads multiple .csv files from GitHUB repository and at first stores them in one (long format) tibble like so:
# write different endings of urls "by hand" with 'ctrl-c' & 'ctrl-v' to get a list.
hobo_id <- c("10088310_Th.csv", "10234637_Th.csv", "10347313_Th.csv", "10347320_Th.csv", "10347321_th.csv", "10347327_Th.csv", "10347328_Th.csv", "10347356_Th.csv", "10347362_Th.csv", "10347366_Th.csv", "10347384_Th.csv", "10347394_Th.csv", "10350002_Th.csv ", "10350005_Th.csv", "10350049_Th.csv", "10610854_Th.csv", "10760709_Th.csv", "10760710_Th.csv", "10760811_Th.csv", "10760820_Th.csv", "10760822_Th.csv", "10801139_th.csv", "10801141_Th.csv")
# import function:
import_csv <- function(hobo_id){
#create urls
HOBO_urls <- paste0('https://raw.githubusercontent.com/data-hydenv/data/master/hobo/2022/hourly/',hobo_id)
# HOBO_urls represents a list of each link, that read_csv will download in the next step
# read in file
hobo_coll <- read_csv(as.character(HOBO_urls))
return(hobo_coll)
}
hobo_coll <- import_csv(hobo_id)
This works so far. But I want to add a column called 'ID'.
One of my approaches looks like this:
import_csv <- function(hobo_id){
#create urls
HOBO_urls <- paste0('https://raw.githubusercontent.com/data-hydenv/data/master/hobo/2022/hourly/',hobo_id)
# read in file
hobo_coll <- read_csv(as.character(HOBO_urls))
# Add column ID
hobo_coll1 <- hobo_coll %>%
mutate(dttm = parse_date_time(dttm, "%Y-%m-%d %H:%M:%S")) %>%
mutate(ID = ifelse(dttm >= "2021-12-13 00:00:00" & dttm <= "2022-01-09 23:00:00", hobo_id, NA))
return(hobo_coll1)
}
This works so far, but the ID from 'hobo_id' should stay the same for 4032 rows (from each "2021-12-13 00:00:00" to "2022-01-09 23:00:00") and then change to the next ID (hobo_id[,2]) and after the next time period of 4032 rows to the next (hobo_id[,3]) and so on.
I thought there must maybe be a way to to it with the tidyr::extract() function, but can't seem to figure out how.
I also considered a for loop, but kind of want to stick to the import_csv() function solution.
Thank you for your help in advance, gladly appreciate it!
Solution 1:[1]
Use the function argument directly, without any indexing and change the line
mutate(ID = ifelse(dttm >= "2021-12-13 00:00:00" & dttm <= "2022-01-09 23:00:00", .[[hobo_id]], NA))
to
mutate(ID = ifelse(dttm >= "2021-12-13 00:00:00" & dttm <= "2022-01-09 23:00:00", hobo_id, NA))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
