'Open multiple xml files, and parse them
I need your help. I'm trying to read many xlm files from just one folder, and I need to extract some information of each xml. These xml have the same structure.
At this point I can read each XML file, but just capture the information of the last one opened. How can I capture the information of each xml file and saved into a dataframe structure with pandas?
This is my code:
from os import listdir, path
import xml.etree.ElementTree as ET
mypath = '/Users/nicolasdiaz/Desktop/dtes copy'
files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')]
for file in files:
print(file)
tree = ET.parse(file)
root = tree.getroot()
for docID in root.iter('Folio'):
Invoice = 'Factura:' + docID.text
print(Invoice)
for client_rut in root.iter('RUTRecep'):
Rut = 'Rut:' + client_rut.text
print(Rut)
And this is my result:, but I need the information of the three xml files
/Users/nicolasdiaz/venv/bin/python
"/Users/nicolasdiaz/PycharmProjects/Marfil/lib/python3.10/Open files.py"
/Users/nicolasdiaz/Desktop/dtes copy/77116757-T33-F1877.xml
/Users/nicolasdiaz/Desktop/dtes copy/77116757-T33-F1960.xml
/Users/nicolasdiaz/Desktop/dtes copy/77116757-T33-F1961.xml
Factura:1961
Rut:93770000-8
Process finished with exit code 0
Solution 1:[1]
Move the two bottom for loops into the above one, like this:
from os import listdir, path import xml.etree.ElementTree as ET
mypath = '/Users/nicolasdiaz/Desktop/dtes copy' files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')]
for file in files: print(file) tree = ET.parse(file) root = tree.getroot()
for docID in root.iter('Folio'):
Invoice = 'Factura:' + docID.text
print(Invoice)
for client_rut in root.iter('RUTRecep'):
Rut = 'Rut:' + client_rut.text
print(Rut)
Create a dataframe before the for statement and in the loop, append to it using:
df.append([file, Invoice, Rut])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | ino |