'Open multiple xml files, and parse them

I need your help. I'm trying to read many xlm files from just one folder, and I need to extract some information of each xml. These xml have the same structure.

At this point I can read each XML file, but just capture the information of the last one opened. How can I capture the information of each xml file and saved into a dataframe structure with pandas?

This is my code:

from os import listdir, path
import xml.etree.ElementTree as ET

mypath = '/Users/nicolasdiaz/Desktop/dtes copy'
files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')]

for file in files:
    print(file)
    tree = ET.parse(file)
    root = tree.getroot()

for docID in root.iter('Folio'):
    Invoice = 'Factura:' + docID.text
    print(Invoice)
for client_rut in root.iter('RUTRecep'):
    Rut = 'Rut:' + client_rut.text
    print(Rut)

And this is my result:, but I need the information of the three xml files

/Users/nicolasdiaz/venv/bin/python 
"/Users/nicolasdiaz/PycharmProjects/Marfil/lib/python3.10/Open files.py"
/Users/nicolasdiaz/Desktop/dtes copy/77116757-T33-F1877.xml
/Users/nicolasdiaz/Desktop/dtes copy/77116757-T33-F1960.xml
/Users/nicolasdiaz/Desktop/dtes copy/77116757-T33-F1961.xml
Factura:1961
Rut:93770000-8

Process finished with exit code 0


Solution 1:[1]

  1. Move the two bottom for loops into the above one, like this:

    from os import listdir, path import xml.etree.ElementTree as ET

mypath = '/Users/nicolasdiaz/Desktop/dtes copy' files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')]

for file in files: print(file) tree = ET.parse(file) root = tree.getroot()

for docID in root.iter('Folio'):
    Invoice = 'Factura:' + docID.text
    print(Invoice)
for client_rut in root.iter('RUTRecep'):
    Rut = 'Rut:' + client_rut.text
    print(Rut)
  1. Create a dataframe before the for statement and in the loop, append to it using:

    df.append([file, Invoice, Rut])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ino