'How to delete only objects from Amazon S3 and not the subfolders which contains the object, using boto library for python
I have a folder structure like /Download/test_queue1/ on Amazon S3 under the bucket events_logs. I want to delete only objects and retain the folder structure. Is it possible to do that?
So, I want to delete on aa.txt, bb.txt & cc.txt and not the /Download/test_queue1/ subfolder structure. How do I do that?
/Download/test_queue1/aa.txt
/Download/test_queue1/bb.txt
/Download/test_queue1/cc.txt
Here is my code which is currently wiping out everything under the bucket.
def _deleteFileInBucket(self,s3_file1,aws_bucket_to_download,aws_bucket_path_to_download):
bucket_path = os.path.join(aws_bucket_path_to_download, s3_file1.strip())
if not re.match(r'.*\.tar\.gz', bucket_path):
print "No batch available to delete from {}".format(aws_bucket_path_to_download)
else:
bucket = self._aws_connection.get_bucket(aws_bucket_to_download)
bucket_list = bucket.list(prefix='Download/test_queue1')
bucket.delete_keys([key.name for key in bucket_list])`
I'm able to achieve this using AWS CLI:
os.system('aws s3 rm s3://{}{}'.format(aws_bucket_path_to_download[1:], s3_file1.strip()))
But how can I achieve the same results using boto library?
Solution 1:[1]
i solved it using boto3, but with aws-cli it works faster.
the boto3 solution(python):
import os
import boto3
BUCKET_NAME = 'YOUR_BUCKET_NAME' # replace with your bucket name
def delete_files_from_s3():
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(BUCKET_NAME)
files_list = my_bucket.objects.all()
objects_to_delete = []
for s3_object in files_list:
# Need to split s3_object.key into path and file name, else it will give error file not found.
path, filename = os.path.split(s3_object.key)
# my_bucket.download_file(s3_object.key, filename)
if path == '': # it means its file within the current folder
objects_to_delete.append({'Key': filename})
response = my_bucket.delete_objects(
Delete={
'Objects': objects_to_delete
}
)
the aws-cli solution: you can do it using aws cli : https://aws.amazon.com/cli/ and some unix command.
this aws cli commands should work:
aws s3 rm s3://<your_bucket_name> --exclude "*" --include "<your_regex>"
if you want to include sub-folders you should add the flag --recursive
or with unix commands:
aws s3 ls s3://<your_bucket_name>/ | awk '{print $4}' | xargs -I% <your_os_shell> -c 'aws s3 rm s3:// <your_bucket_name> /% $1'
explanation:
- list all files on the bucket --pipe-->
- get the 4th parameter(its the file name) --pipe--> // you can replace it with linux command to match your pattern
- run delete script with aws cli
Solution 2:[2]
S3 has buckets and objects; it does not have folders. Having said that, you can create a zero-sized object called myfolder/ and it will give the appearance of a folder named 'myfolder' but it's not really a folder. This is what the AWS console does when you ask it create a folder.
So, you should simply delete the objects one by one from Download/test_queue1/. After you have done that you may or may not have a remaining object named Download/test_queue1/. It will be present if you have previously created a zero-sized object named Download/test_queue1/, and it will be absent otherwise.
If you really need a 'folder', then after deleting the objects you should test for the presence of Download/test_queue1/ and if it's absent then simply create it as a zero-sized object, and you can do that in boto3 something like this:
import boto3
s3 = boto3.resource('s3')
object = s3.Object('events_logs', 'Download/test_queue1/')
object.put()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ggcarmi |
| Solution 2 |
