'Using lxml changes 
 to in a line for some reason
When using the following code one line in the file is being changed for some reason
dpa_tree = etree.parse(dpaFile)
dpa_root = dpa_tree.getroot()
dpa_tree.write(dpaFile, encoding='UTF-8', xml_declaration=True, method='xml', standalone=True)
In the original line, the 
 towards the end of the line is being changed to 
 for some reason. How do I prevent this change from occurring?
The orginal line
<Setting Value="rO0ABXNyAGpjb20udmVjdG9yLmNmZy5nZW4uY29yZS5nZW5jb3JlLmludGVybmFsLmFvdi5BdXRv
bWF0a....
changes to
<Setting Value="rO0ABXNyAGpjb20udmVjdG9yLmNmZy5nZW4uY29yZS5nZW5jb3JlLmludGVybmFsLmFvdi5BdXRv bWF0a....
(the ... at the end of the lines is just to indicate I have not posted the entire line.)
Solution 1:[1]
Both sequences are equivalent. They are both HTML encoded versions of the Line Feed character. In your original file, the hexadecimal representation (
) is used, while the lxml output uses the decimal representation ( ).
So while there seems to be a difference, both are actually representations of the same character (see Why HTML decimal and HTML hex? for some info on why there are different representations to begin with).
If you want to force the hexadecimal representation for some reason, you can use one of the options method='c14n' or method='c14n2' to serialize the element tree to canonical XML.
dpa_tree.write(dpaFile, method='c14n')
Please note: using the canonical methods is not compatible with adding the options to output an XML declaration (xml_declaration=True) or specifying an encoding (encoding='UTF-8').
However, as the W3C notes:
The XML declaration, including version number and character encoding is omitted from the canonical form. The encoding is not needed since the canonical form is encoded in UTF-8. The version is not needed since the absence of a version number unambiguously indicates XML 1.0.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
