'Creating multiple Excel rows from XML

import xml.etree.cElementTree as et
import pandas as pd

tree=et.parse('test.xml')
root=tree.getroot()

Title= []
Date= []

for title in root.iter('title'):
    Title.append(title.text)

for date in root.iter('date'):
    Date.append(date.text)

print(Title)
print(Date)

Jobs_df = pd.DataFrame(
                       list(zip(Title,Date)), 
                       columns=['Title', 'Date'])
                       
                                       
Jobs_df.to_csv("result.csv")

XML

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Ver>55</Ver>
    <data>
        <title>abcd'</title>
        <date>2011-11-12</date>
        <ID>Asdf123</ID>
    </data>
    <data>
        <title>efgg'</title>
        <ID>Asdf123</ID>
    </data>

</Report>

So I already managed to parse the date and title, although the 'title' appears twice. When I print the Title array it prints me two values "abcd" and "efgg". But when I convert it to the csv file, it only shows one row with the first value "abcd" and the "efgg" is skipped. How do I create a second row with exact the same information, but with different Titles?

What I got now:

Title  Date
abcd   2011-11-12

What I want:

Title  Date
abcd   2011-11-12
efgg   2011-11-12

How do I achieve that? Whats worth mentioning is, the second value efgg has to reuse the already existing information (date), I need all the previous information in new row, but with only the Title value changed



Solution 1:[1]

Iterating over <title> and <date> tags separately will generate a mess. Instead, iterate over <data> and use tha last recorded value of <date> if it's not present in the current <data> tag.

This should do the trick:

Title = []
Date = []
last_date = None

for data in root.iter('data'):
  Title.append(data.find('title').text)
  date_el = data.find('date')
  if date_el != None:
    Date.append(date_el.text)
    last_date = date_el.text
  else:
    Date.append(last_date)
   
print(Title)
print(Date)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jacek Jaskólski