'In r - Is there a way of adding a name column ('ID') to tibble with multiple imported .csv files from url?

I am trying to write a function which downloads multiple .csv files from GitHUB repository and at first stores them in one (long format) tibble like so:

# write different endings of urls "by hand" with 'ctrl-c' & 'ctrl-v' to get a list.

hobo_id <- c("10088310_Th.csv", "10234637_Th.csv", "10347313_Th.csv", "10347320_Th.csv", "10347321_th.csv", "10347327_Th.csv", "10347328_Th.csv", "10347356_Th.csv", "10347362_Th.csv", "10347366_Th.csv", "10347384_Th.csv", "10347394_Th.csv", "10350002_Th.csv ", "10350005_Th.csv", "10350049_Th.csv", "10610854_Th.csv", "10760709_Th.csv", "10760710_Th.csv", "10760811_Th.csv", "10760820_Th.csv", "10760822_Th.csv", "10801139_th.csv", "10801141_Th.csv")

# import function: 

import_csv <- function(hobo_id){
  #create urls
  HOBO_urls <- paste0('https://raw.githubusercontent.com/data-hydenv/data/master/hobo/2022/hourly/',hobo_id)

# HOBO_urls represents a list of each link, that read_csv will download in the next step
  
  # read in file 
  hobo_coll <- read_csv(as.character(HOBO_urls))
  
  
  return(hobo_coll)
}

hobo_coll <- import_csv(hobo_id)

This works so far. But I want to add a column called 'ID'.

One of my approaches looks like this:

import_csv <- function(hobo_id){
  #create urls
  HOBO_urls <- paste0('https://raw.githubusercontent.com/data-hydenv/data/master/hobo/2022/hourly/',hobo_id)
  
  # read in file 
  hobo_coll <- read_csv(as.character(HOBO_urls))
  
  # Add column ID 
  
  
  hobo_coll1 <- hobo_coll %>% 
    mutate(dttm = parse_date_time(dttm, "%Y-%m-%d %H:%M:%S")) %>% 
    mutate(ID = ifelse(dttm >= "2021-12-13 00:00:00" & dttm <= "2022-01-09 23:00:00", hobo_id, NA)) 
  
  return(hobo_coll1)
}

This works so far, but the ID from 'hobo_id' should stay the same for 4032 rows (from each "2021-12-13 00:00:00" to "2022-01-09 23:00:00") and then change to the next ID (hobo_id[,2]) and after the next time period of 4032 rows to the next (hobo_id[,3]) and so on.

I thought there must maybe be a way to to it with the tidyr::extract() function, but can't seem to figure out how.

I also considered a for loop, but kind of want to stick to the import_csv() function solution.

Thank you for your help in advance, gladly appreciate it!



Solution 1:[1]

Use the function argument directly, without any indexing and change the line

mutate(ID = ifelse(dttm >= "2021-12-13 00:00:00" & dttm <= "2022-01-09 23:00:00", .[[hobo_id]], NA)) 

to

mutate(ID = ifelse(dttm >= "2021-12-13 00:00:00" & dttm <= "2022-01-09 23:00:00", hobo_id, NA)) 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1