'Convert XML File into Pandas Dataframe
I want to convert a XML file into an Pandas Dataframe.
XML file:
<image id="0" name="img1.jpg" width="1024" height="488">
<polyline label="Heizung" points="1.0,1.0;1.0,2.0;1.5,1.5">
</polyline>
<polyline label="Fenster" points="2.0,1.0;3.0,1.0;0.5,1.5">
</polyline>
</image>
<image id="1" name="img2.jpg" width="355" height="355">
<polyline label="Heizung" points="3.0,2.0;1.3,1.0;2.5,1.5">
</polyline>
<polyline label="Fenster" points="1.5,2.0;3.0,1.0;0.5,1.5">
</polyline>
</image>
<image id="2" name="img3.jpg" width="620" height="502">
<polyline label="Heizung" points="3.0,1.0;2.0,0.5;1.0,1.0">
</polyline>
</image>
I want my dataframe to look like this:
| id | name | width | height | label | points |
| -- | -------- | ----- | ------ | ------- | ------ |
| 0 | img1.jpg | 1024 | 488 | Heizung | [...] |
| 0 | img1.jpg | 1024 | 488 | Fenster | [...] |
| 1 | img2.jpg | 355 | 355 | Heizung | [...] |
| 1 | img2.jpg | 355 | 355 | Fenster | [...] |
| 2 | img3.jpg | 1024 | 488 | Heizung | [...] |
When I use the following code, I just get id, name, width, height, polyline as values and the values for polyline are nan. df1 is a (3,5) dataframe.
df1 = pd.read_xml("input.xml", xpath=".//image")
When I use the following code, I get the label and points, but not the ID, so i can't join the dataframes. df2 is a (5,2) dataframe
df2 = pd.read_xml("input.xml", xpath=".//polyline")
How can I merge or join both dataframes to get the wanted format?
Thank you very much in advance
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
