'How to handle unicode directly with pandas.read_xml?
I have an .xml from an online source and want to read the XML directly into python. I do use the pandas command
pd.read_xml(url)
However. I get the error:
File "<string>", line 3300
lxml.etree.XMLSyntaxError: PCDATA invalid Char value 26, line 3300, column 15
Inpecting opening the dataset, I see the line has a special character(PyCharm shows a [SUB] between the whitespaces after XETRA):
<column>XETRA Regulierter Markt</column>
Can I handle these special characters directly in pandas? Or do I need to download the set beforehand and clean it up? How could I clean the XML from unicode characters?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
