'_VALUE column when reading XML
Given this rather funky XML structure:
<Report>
<Table>
<List>
<DTL a="abc"
b="xyz"
.../>
<DTL a="bcd"
b="foo"
...
If I read that into a data frame, I end up with this schema:
root
|-- Table: struct (nullable = true)
| |-- List: struct (nullable = true)
| | |-- DTL: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- _a: string (nullable = true)
| | | | |-- _b: string (nullable = true)
| | | | |-- _VALUE: string (nullable = true)
I can't quite get where this _VALUE column is comin from. I understand that a,b, etc are attributes. The documentation says: valueTag: The tag used for the value when there are attributes in the element having no child. Default is _VALUE.
What does an attribute having no child actually mean here? Other than excluding it from a downstream dataframe, is there any way to avoid this column?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
