'Reading xml files from S3 bucket in Python - Only the content of the last file is getting stored
I have 4 XML files inside the S3 bucket directory. When I'm trying to read the content of all the files, I find that only the content of the last file (XML4) is getting stored.
s3_bucket_name='test'
bucket=s3.Bucket(s3_bucket_name)
bucket_list = []
for file in bucket.objects.filter(Prefix = 'auto'):
file_name=file.key
if file_name.find(".xml")!=-1:
bucket_list.append(file.key)
In the 'bucket_list', I can see that there are 4 files
for file in bucket_list:
obj = s3.Object(s3_bucket_name,file)
data = (obj.get()['Body'].read())
tree = ET.ElementTree(ET.fromstring(data))
What changes should be made in the code to read the content of all the XML files?
Solution 1:[1]
As mentioned, since you have a list of files, you need a corresponding list of trees.
tree_list = []
for file in bucket_list:
obj = s3.Object(s3_bucket_name,file)
data = (obj.get()['Body'].read())
tree_list.append(ET.ElementTree(ET.fromstring(data)))
Then you can start using tree_list for whatever purpose.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ewong |
