'Efficiently uploading image files in parallel to AWS S3 using Tensorflow

For a tf.data.Dataset object temp containing the string encoding of PNG files and the key (also as a string) of the target S3 object, I was able to use the following for loop to write the files into my S3 bucket

for image_string, name in temp:
    s3_client.put_object(Bucket=bucket_name , 
                         Key = name.numpy().decode('utf-8'), 
                         Body=image_string.numpy()) 

The above is slow, so I'm wondering if there's a more efficient way to write all the image files that leverages Tensorflow optimizations. I attempted to use the .map method below which is only slightly faster

def upload_to_s3(image_string, name):
    s3_client.put_object(Bucket=bucket_name , 
                         Key = name.numpy().decode('utf-8'), 
                         Body=image_string.numpy()) 
    return image_string, name
    
temp3 = temp.map(lambda image_string, name: tf.py_function(upload_to_s3, [image_string, name], (tf.string, tf.string))) 
_ = list(temp3.as_numpy_iterator()) 


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source