'Converting XML to pandas DataFrame using read_xml

I am trying to convert the following XML to pandas data frame using pd.read_xml api.

<?xml version='1.0' encoding='ISO-8859-1'?>
<Request>
    <Employee Name="James">
        <Address>Virginia</Address>
        <Project Name="project1">
            <Description>Description of Project 1</Description>
        </Project>
        <Project Name="project2">
            <Description>Description of Project 2</Description>
        </Project>
    </Employee>
</Request>

I tried the following code

df1 = pd.read_xml(filename, xpath="./*",parser="lxml")
print("\n"+str(df1.to_markdown))

and got the result something like this

<bound method DataFrame.to_markdown of     Name   Address  Project
0  James  Virginia      NaN>

Similarly when i tried to change the Xpath to read all the elements like below

df2 = pd.read_xml(filename, xpath="./*/*",parser="lxml")
print("\n"+str(df2.to_markdown))

I got result something like this

<bound method DataFrame.to_markdown of     Address      Name               Description
0  Virginia      None                      None
1      None  project1  Description of Project 1
2      None  project2  Description of Project 2>

What I am expecting is to get the results in the following format

<bound method DataFrame.to_markdown of  
  EmployeName Address      ProjectName              Description
0 James       Virginia      project1           Description of Project 1
1 James       Virginia      project2           Description of Project 2>

Is there a way to do this using read_xml api or any other library?



Solution 1:[1]

See below

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<?xml version='1.0' encoding='ISO-8859-1'?>
<Request>
    <Employee Name="James">
        <Address>Virginia</Address>
        <Project Name="project1">
            <Description>Description of Project 1</Description>
        </Project>
        <Project Name="project2">
            <Description>Description of Project 2</Description>
        </Project>
    </Employee>
</Request>'''

root = ET.fromstring(xml)
data = []
emp = root.find('.//Employee')
name = emp.attrib['Name']
addr = emp.find('Address').text
for proj in emp.findall('.//Project'):
  proj_name = proj.attrib['Name']
  desc = proj.find('Description').text
  data.append({'EmployeName':name,'Address':addr,'ProjectName':proj_name,'Description':desc})
df = pd.DataFrame(data)
print(df)

output

  EmployeName   Address ProjectName               Description
0       James  Virginia    project1  Description of Project 1
1       James  Virginia    project2  Description of Project 2

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 balderman