'Running on loop on a data pull with Amazon S3
I'm trying to pull data from Eviction Lab, which uses Amazon S3. To pull the data from Amazon's servers, I am using the cloudy package's aws.s3 function. I want to pull the census tract data for all 50 states. They are each large, .csv files. If I run this code, I can get successfully pull each individual state's data:
NY.tract<-get_object("NY/tracts.csv", bucket = "eviction-lab-data-downloads")
But, I want to run a loop that automates the process, in case I want to change what I pull later on down the road.
I'm running into two main problems with my loop:
(1) I have to figure out how to specify the "NY/tracts.csv" within the
get_objectfunction so that it changes with each pull. I'm not sure my loop is doing that(2) I need to name each data pull by the State appreciation. I could use another list or data.frame to specify the Stata abbreviation, but I have no idea where to begin with that.
My attempt at a loop is still missing quite a bit. The "file.paths" that I reference in the sequence of the loop function is a data.frame I pulled into R that is a string variable with all the 50 state names that I want to pull as 50 row observations. For example, the first row is "AL/tracts.csv," the second is "AK/tracts.csv", etc. Here is the loop that I've written:
for(i in 1:nrow(file.paths)){
my.data<-get_object("i", bucket = "eviction-lab-data-downloads")
}
View(my.data)
When I run this loop, it returns 272 observations for 1 variable. I want to get 50 different .csv datasets, named according to the state abbreviation, which I can bind together into 1 nationwide dataset.
Maybe it's not possible to do with the get_object function? I can certainly write 50 lines of code to get the individual objects I want, but I'd prefer a loop so I can edit it in the future.
Any help here would be awesome.
Thanks. Best, Kasey
Solution 1:[1]
Something like this should work...
create an empty data frame, then pull the files in and append.
df <- data.frame(Date=as.Date(character()),
File=character(),
User=character(),
stringsAsFactors=FALSE)
for (i in list_of_paths) {
object <- get_object(i, 'bucket_name')
df_i <- read_csv(object)
bind_rows(df, df_i)
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | pyll |
