'How to extract specific attributes value from multiple tags in xml using python

xml:

<?xml version="1.0" encoding="UTF-8"?>
<Page xmlns="http://gigabyte.com/documoto/Statuslist/1.6" xmlns:xs="http://www.w3.org/2001/XMLSchema" hashKey="MDAwNTgxMzQtQS0xLjEuc3Zn" pageFile="status-1.1.svg" tenantKey="Staus">
  <Stage description="SPREADER,GB/DD" locale="en" name="SPREADER,GB/DD"/>
  <File Price="0.0" Id="1" item="1" stage_status="true" ForPage="true" Number="05051401">
    <Stage description="" locale="n" name="DANGER"/>
  </File>
  <File Price="0.0" Id="2" item="2" stage_status="true" ForPage="true" Number="05051402">
    <Stage description="" locale="n" name="SPINNERS"/>
  </File>
  <File Price="0.0" Id="3" item="3" stage_status="true" ForPage="true" Number="05051404">
    <Stage description="" locale="n" name="CAUTION"/>
  </File>
</Page>

Expected Output in table format is:

Id,item,stage_status,Number

1,1,True,05051401, ,DANGER

1,1,True,05051402, ,SPINNERS

1,1,True,05051404, ,CAUTION

I tried this code:

import csv
import xml.etree.ElementTree as ET

tree = ET.parse("status-1.1.xml")
root = tree.getroot()

with open('Data.csv', 'w') as f:
    w = csv.DictWriter(f, fieldnames=('Id', 'item', 'stage_status', 'Number','description','name'))
    w.writerheader()
    w.writerows(e.attrib for e in root.findall('.//Page/File/Stage'))

I'm trying to get values from both File and stage tags.

Solution 1:^[1]

from bs4 import BeautifulSoup as Soup
import pandas as pd

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<Page xmlns="http://gigabyte.com/documoto/Statuslist/1.6" xmlns:xs="http://www.w3.org/2001/XMLSchema" hashKey="MDAwNTgxMzQtQS0xLjEuc3Zn" pageFile="status-1.1.svg" tenantKey="Staus">
  <Stage description="SPREADER,GB/DD" locale="en" name="SPREADER,GB/DD"/>
  <File Price="0.0" Id="1" item="1" stage_status="true" ForPage="true" Number="05051401">
    <Stage description="" locale="n" name="DANGER"/>
  </File>
  <File Price="0.0" Id="2" item="2" stage_status="true" ForPage="true" Number="05051402">
    <Stage description="" locale="n" name="SPINNERS"/>
  </File>
  <File Price="0.0" Id="3" item="3" stage_status="true" ForPage="true" Number="05051404">
    <Stage description="" locale="n" name="CAUTION"/>
  </File>
</Page>
'''
xml_data = Soup(xml, features="lxml")


params = ['id','item','stage_status','number']
all_data = []
for i in xml_data.findAll("file"):
    tmp_dict = dict(zip(params,[i['id'],i['item'],i.find('stage')['name'],i['number']]))
    all_data.append(tmp_dict)
df = pd.DataFrame(all_data)
df

Output:

    id  item    stage_status    number
0   1   1       DANGER          05051401
1   2   2       SPINNERS        05051402
2   3   3       CAUTION         05051404

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Mazhar

'How to extract specific attributes value from multiple tags in xml using python

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]