'Using lxml changes 
 to 
 in a line for some reason

When using the following code one line in the file is being changed for some reason

dpa_tree = etree.parse(dpaFile)
dpa_root = dpa_tree.getroot()
dpa_tree.write(dpaFile, encoding='UTF-8', xml_declaration=True, method='xml', standalone=True)

In the original line, the &#xA towards the end of the line is being changed to &#10 for some reason. How do I prevent this change from occurring?

The orginal line

<Setting Value="rO0ABXNyAGpjb20udmVjdG9yLmNmZy5nZW4uY29yZS5nZW5jb3JlLmludGVybmFsLmFvdi5BdXRv&#xA;bWF0a....

changes to

<Setting Value="rO0ABXNyAGpjb20udmVjdG9yLmNmZy5nZW4uY29yZS5nZW5jb3JlLmludGVybmFsLmFvdi5BdXRv&#10;bWF0a....

(the ... at the end of the lines is just to indicate I have not posted the entire line.)



Solution 1:[1]

Both sequences are equivalent. They are both HTML encoded versions of the Line Feed character. In your original file, the hexadecimal representation (&#xA;) is used, while the lxml output uses the decimal representation (&#10;).

So while there seems to be a difference, both are actually representations of the same character (see Why HTML decimal and HTML hex? for some info on why there are different representations to begin with).

If you want to force the hexadecimal representation for some reason, you can use one of the options method='c14n' or method='c14n2' to serialize the element tree to canonical XML.

dpa_tree.write(dpaFile, method='c14n')

Please note: using the canonical methods is not compatible with adding the options to output an XML declaration (xml_declaration=True) or specifying an encoding (encoding='UTF-8').

However, as the W3C notes:

The XML declaration, including version number and character encoding is omitted from the canonical form. The encoding is not needed since the canonical form is encoded in UTF-8. The version is not needed since the absence of a version number unambiguously indicates XML 1.0.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1