'efficient data collection from API using R
I am trying to get data from the UN Stats API for a list of indicators (https://unstats.un.org/SDGAPI/swagger/).
I have constructed a loop that can be used to get the data for a single indicator (code is below). The loop can be applied to multiple indicators as needed. However, this is likely to cause problems relating to large numbers of requests, potentially being perceived as a DDoS attack and taking far too long.
Is there an alternative way to get data for an indicator for all years and countries without making a ridiculous number of requests or in a more efficient manner than below? I suppose this question likely applies more generally to other similar APIs as well. Any help would be most welcome.
Please note: I have seen the post here (Faster download for paginated nested JSON data from API in R?) but it is not quite what I am looking for.
Minimal working example
# libraries
library(jsonlite)
library(dplyr)
library(purrr)
# get the meta data
page = ("https://unstats.un.org/SDGAPI//v1/sdg/Series/List")
sdg_meta = fromJSON(page) %>% as.data.frame()
# parameters
PAGE_SIZE =100000
N_PAGES = 5
FULL_DF = NULL
my_code = "SI_COV_SOCINS"
# loop to go over pages
for(i in seq(1,N_PAGES,1)){
ind = which(sdg_meta$code == my_code)
cat(paste0("Processing : ", my_code, " ", i, " of ",N_PAGES, " \n"))
my_data_page <- c(paste0("https://unstats.un.org/SDGAPI/v1/sdg/Series/Data?seriesCode=",my_code,"&page=",i,"pageSize=",PAGE_SIZE))
df <- fromJSON(my_data_page) #depending on the data you are calling, you will get a list
df= df$data %>% as.data.frame() %>% distinct()
# break the loop when no more to add
if(is_empty(df)){
break
}
FULL_DF = rbind(FULL_DF,df)
Sys.sleep(5) # sleep to avoid any issues
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
