'Efficiently move many small files to Amazon S3
I have around 60,000 small image files (total size 200mb) that I would like to move out of my project repository to Amazon S3.
I have tried s3fs (http://code.google.com/p/s3fs/), mounting S3 via Transmit on Mac OS X as well as the Amazon AWS S3 web uploader. Unfortunately it seems like all of these would take a very long time, more than a day or two, to accomplish the task.
Is there any better way?
Solution 1:[1]
There are a few things that could be limiting the flow of data and each has a different way to alleviate it:
Your transfer application might be adding overhead. If s3fs is too slow, you might try other options like the S3 tab on the AWS console or a tool like s3cmd.
The network latency between your computer and S3 and the latency in API call responses can be a serious factor in how much you can do in a single thread. The key to solving this is to upload multiple files (dozens) in parallel.
You could just have a slow network connection between you and S3, placing a limit on the total data transfer speed possible. If you can compress the files, you could upload them in compressed form to a temporary EC2 instance and then uncompress and upload from the instance to S3.
My bet is on number 2 which is not always the easiest to solve unless you have upload tools that will parallelize for you.
Solution 2:[2]
Jeff Atwood made a blog post a few years ago titled Using Amazon S3 as an Image Hosting Service. His solution for a similar problem (image hosting usually consists of hosting many small files) was to use S3Fox Organizer for Firefox.
To address a previous answer, Amazon S3 does not allow you to unzip your files (to do this you would need to download, unzip, and re-upload).
Solution 3:[3]
We created a tool for our project with similar requirements. You can download it here:
Solution 4:[4]
I ran across this thread because I had the same question. For me, I was uploading about 26,000 small (~50KB each) files via the S3 Management Console web interface, and the throughput got consistently stuck at around 84 KB/s. I downloaded the AWS CLI, and used the S3 copy command and got about 4MB/s upload throughput.
Here are some references:
- AWS S3 Manual
- Configuring AWS CLI (Pay attention to the configuring credentials)
- AWS CLI Installation
After you install the AWS cli, configure it with your access and secret keys:
aws configure
Then, S3 copy is pretty straightforward. See the reference for more examples, but for me it was something like this:
aws s3 cp images s3://my-bucket/images --recursive
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Eric Hammond |
| Solution 2 | Cameron S |
| Solution 3 | Jake Wilson |
| Solution 4 | cecomp64 |
