'How to extract specific attributes value from multiple tags in xml using python
xml:
<?xml version="1.0" encoding="UTF-8"?>
<Page xmlns="http://gigabyte.com/documoto/Statuslist/1.6" xmlns:xs="http://www.w3.org/2001/XMLSchema" hashKey="MDAwNTgxMzQtQS0xLjEuc3Zn" pageFile="status-1.1.svg" tenantKey="Staus">
<Stage description="SPREADER,GB/DD" locale="en" name="SPREADER,GB/DD"/>
<File Price="0.0" Id="1" item="1" stage_status="true" ForPage="true" Number="05051401">
<Stage description="" locale="n" name="DANGER"/>
</File>
<File Price="0.0" Id="2" item="2" stage_status="true" ForPage="true" Number="05051402">
<Stage description="" locale="n" name="SPINNERS"/>
</File>
<File Price="0.0" Id="3" item="3" stage_status="true" ForPage="true" Number="05051404">
<Stage description="" locale="n" name="CAUTION"/>
</File>
</Page>
Expected Output in table format is:
Id,item,stage_status,Number
1,1,True,05051401, ,DANGER
1,1,True,05051402, ,SPINNERS
1,1,True,05051404, ,CAUTION
I tried this code:
import csv
import xml.etree.ElementTree as ET
tree = ET.parse("status-1.1.xml")
root = tree.getroot()
with open('Data.csv', 'w') as f:
w = csv.DictWriter(f, fieldnames=('Id', 'item', 'stage_status', 'Number','description','name'))
w.writerheader()
w.writerows(e.attrib for e in root.findall('.//Page/File/Stage'))
I'm trying to get values from both File and stage tags.
Solution 1:[1]
from bs4 import BeautifulSoup as Soup
import pandas as pd
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<Page xmlns="http://gigabyte.com/documoto/Statuslist/1.6" xmlns:xs="http://www.w3.org/2001/XMLSchema" hashKey="MDAwNTgxMzQtQS0xLjEuc3Zn" pageFile="status-1.1.svg" tenantKey="Staus">
<Stage description="SPREADER,GB/DD" locale="en" name="SPREADER,GB/DD"/>
<File Price="0.0" Id="1" item="1" stage_status="true" ForPage="true" Number="05051401">
<Stage description="" locale="n" name="DANGER"/>
</File>
<File Price="0.0" Id="2" item="2" stage_status="true" ForPage="true" Number="05051402">
<Stage description="" locale="n" name="SPINNERS"/>
</File>
<File Price="0.0" Id="3" item="3" stage_status="true" ForPage="true" Number="05051404">
<Stage description="" locale="n" name="CAUTION"/>
</File>
</Page>
'''
xml_data = Soup(xml, features="lxml")
params = ['id','item','stage_status','number']
all_data = []
for i in xml_data.findAll("file"):
tmp_dict = dict(zip(params,[i['id'],i['item'],i.find('stage')['name'],i['number']]))
all_data.append(tmp_dict)
df = pd.DataFrame(all_data)
df
Output:
id item stage_status number
0 1 1 DANGER 05051401
1 2 2 SPINNERS 05051402
2 3 3 CAUTION 05051404
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mazhar |
