'Efficiently move many small files to Amazon S3

I have around 60,000 small image files (total size 200mb) that I would like to move out of my project repository to Amazon S3.

I have tried s3fs (http://code.google.com/p/s3fs/), mounting S3 via Transmit on Mac OS X as well as the Amazon AWS S3 web uploader. Unfortunately it seems like all of these would take a very long time, more than a day or two, to accomplish the task.

Is there any better way?



Solution 1:[1]

There are a few things that could be limiting the flow of data and each has a different way to alleviate it:

  1. Your transfer application might be adding overhead. If s3fs is too slow, you might try other options like the S3 tab on the AWS console or a tool like s3cmd.

  2. The network latency between your computer and S3 and the latency in API call responses can be a serious factor in how much you can do in a single thread. The key to solving this is to upload multiple files (dozens) in parallel.

  3. You could just have a slow network connection between you and S3, placing a limit on the total data transfer speed possible. If you can compress the files, you could upload them in compressed form to a temporary EC2 instance and then uncompress and upload from the instance to S3.

My bet is on number 2 which is not always the easiest to solve unless you have upload tools that will parallelize for you.

Solution 2:[2]

Jeff Atwood made a blog post a few years ago titled Using Amazon S3 as an Image Hosting Service. His solution for a similar problem (image hosting usually consists of hosting many small files) was to use S3Fox Organizer for Firefox.

To address a previous answer, Amazon S3 does not allow you to unzip your files (to do this you would need to download, unzip, and re-upload).

Solution 3:[3]

We created a tool for our project with similar requirements. You can download it here:

https://github.com/mshytikov/s3files

Solution 4:[4]

I ran across this thread because I had the same question. For me, I was uploading about 26,000 small (~50KB each) files via the S3 Management Console web interface, and the throughput got consistently stuck at around 84 KB/s. I downloaded the AWS CLI, and used the S3 copy command and got about 4MB/s upload throughput.

Here are some references:

After you install the AWS cli, configure it with your access and secret keys:

aws configure

Then, S3 copy is pretty straightforward. See the reference for more examples, but for me it was something like this:

aws s3 cp images s3://my-bucket/images --recursive

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Eric Hammond
Solution 2 Cameron S
Solution 3 Jake Wilson
Solution 4 cecomp64