'How to delete only objects from Amazon S3 and not the subfolders which contains the object, using boto library for python

I have a folder structure like /Download/test_queue1/ on Amazon S3 under the bucket events_logs. I want to delete only objects and retain the folder structure. Is it possible to do that?

So, I want to delete on aa.txt, bb.txt & cc.txt and not the /Download/test_queue1/ subfolder structure. How do I do that?

/Download/test_queue1/aa.txt
/Download/test_queue1/bb.txt
/Download/test_queue1/cc.txt

Here is my code which is currently wiping out everything under the bucket.

def _deleteFileInBucket(self,s3_file1,aws_bucket_to_download,aws_bucket_path_to_download):
        bucket_path = os.path.join(aws_bucket_path_to_download, s3_file1.strip())
    if not re.match(r'.*\.tar\.gz', bucket_path):
        print "No batch available to delete from {}".format(aws_bucket_path_to_download)
    else:
        bucket = self._aws_connection.get_bucket(aws_bucket_to_download)
        bucket_list = bucket.list(prefix='Download/test_queue1')
        bucket.delete_keys([key.name for key in bucket_list])` 

I'm able to achieve this using AWS CLI:

os.system('aws s3 rm s3://{}{}'.format(aws_bucket_path_to_download[1:], s3_file1.strip()))

But how can I achieve the same results using boto library?



Solution 1:[1]

i solved it using boto3, but with aws-cli it works faster.

the boto3 solution(python):

import os
import boto3

BUCKET_NAME = 'YOUR_BUCKET_NAME' # replace with your bucket name

def delete_files_from_s3():

    s3 = boto3.resource('s3')
    my_bucket = s3.Bucket(BUCKET_NAME)
    files_list = my_bucket.objects.all()

    objects_to_delete = []

    for s3_object in files_list:
        # Need to split s3_object.key into path and file name, else it will give error file not found.
        path, filename = os.path.split(s3_object.key)
        # my_bucket.download_file(s3_object.key, filename)
        if path == '':  # it means its file within the current folder
            objects_to_delete.append({'Key': filename})

    response = my_bucket.delete_objects(
        Delete={
            'Objects': objects_to_delete
        }
    )

the aws-cli solution: you can do it using aws cli : https://aws.amazon.com/cli/ and some unix command.

this aws cli commands should work:

aws s3 rm s3://<your_bucket_name> --exclude "*" --include "<your_regex>"

if you want to include sub-folders you should add the flag --recursive

or with unix commands:

aws s3 ls s3://<your_bucket_name>/ | awk '{print $4}' | xargs -I%  <your_os_shell>   -c 'aws s3 rm s3:// <your_bucket_name>  /% $1'

explanation:

  1. list all files on the bucket --pipe-->
  2. get the 4th parameter(its the file name) --pipe--> // you can replace it with linux command to match your pattern
  3. run delete script with aws cli

Solution 2:[2]

S3 has buckets and objects; it does not have folders. Having said that, you can create a zero-sized object called myfolder/ and it will give the appearance of a folder named 'myfolder' but it's not really a folder. This is what the AWS console does when you ask it create a folder.

So, you should simply delete the objects one by one from Download/test_queue1/. After you have done that you may or may not have a remaining object named Download/test_queue1/. It will be present if you have previously created a zero-sized object named Download/test_queue1/, and it will be absent otherwise.

If you really need a 'folder', then after deleting the objects you should test for the presence of Download/test_queue1/ and if it's absent then simply create it as a zero-sized object, and you can do that in boto3 something like this:

import boto3
s3 = boto3.resource('s3')
object = s3.Object('events_logs', 'Download/test_queue1/')
object.put()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ggcarmi
Solution 2