'Efficiently uploading image files in parallel to AWS S3 using Tensorflow
For a tf.data.Dataset object temp containing the string encoding of PNG files and the key (also as a string) of the target S3 object, I was able to use the following for loop to write the files into my S3 bucket
for image_string, name in temp:
s3_client.put_object(Bucket=bucket_name ,
Key = name.numpy().decode('utf-8'),
Body=image_string.numpy())
The above is slow, so I'm wondering if there's a more efficient way to write all the image files that leverages Tensorflow optimizations. I attempted to use the .map method below which is only slightly faster
def upload_to_s3(image_string, name):
s3_client.put_object(Bucket=bucket_name ,
Key = name.numpy().decode('utf-8'),
Body=image_string.numpy())
return image_string, name
temp3 = temp.map(lambda image_string, name: tf.py_function(upload_to_s3, [image_string, name], (tf.string, tf.string)))
_ = list(temp3.as_numpy_iterator())
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
