'Creating multiple Excel rows from XML
import xml.etree.cElementTree as et
import pandas as pd
tree=et.parse('test.xml')
root=tree.getroot()
Title= []
Date= []
for title in root.iter('title'):
Title.append(title.text)
for date in root.iter('date'):
Date.append(date.text)
print(Title)
print(Date)
Jobs_df = pd.DataFrame(
list(zip(Title,Date)),
columns=['Title', 'Date'])
Jobs_df.to_csv("result.csv")
XML
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Ver>55</Ver>
<data>
<title>abcd'</title>
<date>2011-11-12</date>
<ID>Asdf123</ID>
</data>
<data>
<title>efgg'</title>
<ID>Asdf123</ID>
</data>
</Report>
So I already managed to parse the date and title, although the 'title' appears twice. When I print the Title array it prints me two values "abcd" and "efgg". But when I convert it to the csv file, it only shows one row with the first value "abcd" and the "efgg" is skipped. How do I create a second row with exact the same information, but with different Titles?
What I got now:
Title Date
abcd 2011-11-12
What I want:
Title Date
abcd 2011-11-12
efgg 2011-11-12
How do I achieve that? Whats worth mentioning is, the second value efgg has to reuse the already existing information (date), I need all the previous information in new row, but with only the Title value changed
Solution 1:[1]
Iterating over <title> and <date> tags separately will generate a mess. Instead, iterate over <data> and use tha last recorded value of <date> if it's not present in the current <data> tag.
This should do the trick:
Title = []
Date = []
last_date = None
for data in root.iter('data'):
Title.append(data.find('title').text)
date_el = data.find('date')
if date_el != None:
Date.append(date_el.text)
last_date = date_el.text
else:
Date.append(last_date)
print(Title)
print(Date)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Jacek Jaskólski |
