'Editing XML file without ignoring whitespaces in attributes values

I want to update one xml file with values from another xml file. It works fine after parsing but I have one problem with specified attributes values. After parsing XML file, every whitespace is ignored, for example, if

value='something

something'

it will change to value='something something' and my file can't be like that.

There is a picture showing someway what is my concerne:

picture

I want to keep these values with more than one line. As I undestand, parsing xml file destroys structure of the original file, but is there any simple way to fix my program so It will somehow avoid igoring whitespaces?

Here is my code:

import xml.etree.ElementTree as ET

Mainfile = 'Mainfile_1.xml'
tree = ET.parse(Mainfile)
root = tree.getroot()
fixfile = 'fixfile_1.xml'
tree2 = ET.parse(fixfile)
root2 = tree2.getroot()
for objects in root.iter('object'):
    objid = objects.attrib.get('id')
    for attributes in objects.getchildren():
        name = attributes.attrib.get('name')
        value = attributes.attrib.get('value')
        if value == 'FAIL':
            for objects2 in root2.iter('object'):
                objid2 = objects2.attrib.get('id')
                for attributes2 in objects2.getchildren():
                    name2 = attributes2.attrib.get('name')
                    value2 = attributes2.attrib.get('value')
                    if objid2 == objid:
                        if name == name2:
                            attributes.set('value', value2)

tree.write('Mainfile_1updated.xml',xml_declaration=True, encoding='UTF-8')

Here is MainXML:

<?xml version='1.0' encoding='UTF-8'?>
<Module bs='Mainfile_1'>
<object name='namex' number='1' id='1000'>
    <item name='item0' value='100'/>
    <item name='item00' value='100'/>
</object>
<object name='namey' number='2' id='1001'>
    <item name='item1' value='100'/>
    <item name='item00' value='100'/>
</object>
<object name='name1' number='3' id='1234'>
    <item name='item1' value='FAIL'/>
    <item name='item2' value='233
    
    233'/>
    <item name='item3' value='233'/>
    <item name='item4' value='FAIL'/>
</object>
<object name='name2' number='4' id='1238'>
    <item name='item8' value='FAIL'/>
    <item name='item9' value='233'/>
</object>
<object name='name32' number='5' id='2345'>
    <item name='item1' value='111'/>
    <item name='item2' value='FAIL'/>
</object>
<object name='name4' number='6' id='2347'>
    <item name='item1' value='FAIL'/>
    <item name='item2' value='FAIL'/>
    <item name='item3' value='233'/>
    <item name='item4' value='FAIL'/>
</object>
</Module>

And here is fix file:

<?xml version='1.0' encoding='UTF-8'?>
<Module bs='Mainfile_1'>
<object id='1234'>
    <item name='item1' value='something
something111'/>
    <item name='item4' value='something
1something'/>
</object>
<object id='1238'>
    <item name='item8' value='something12
1something'/>
</object>
<object id='2345'>
    <item name='item2' value='something
12something'/>
</object>
<object id='2347'>
    <item name='item1' value='something14
13of something'/>
    <item name='item2' value='something
11something'/>
    <item name='item4' value='something14
something14
something12
13something'/>
</object>
</Module>


Solution 1:[1]

"it will change to value='something something' and my file can't be like that.*

Then you must stop using attributes like that. Line breaks characters inside attribute values will be normalized into spaces when the XML file is parsed. You can open a text editor and produce XML like this:

<element value="something
something" />

but upon parsing, this will turn into the equivalent of

<element value="something something" />

That's just how it works.

If you want to store things like tabs or newlines in attribute values, you must explicitly escape them. Then they will be retained when the document is parsed:

<element value="something&#xA;
something" />

<element value="something&#xA; something" />

Both of these will give an attribute value of "something\n something" in the resulting DOM.


That being said, ElementTree's implementation is broken, there is literally nothing you can do.

Use lxml, their implementation is correct.

from lxml import etree as ET

value = ET.fromstring('<element value="something&#xA; something" />').attrib['value']
print(value)
# => 'something\n something'

value = ET.fromstring('<element value="something&#xA;\nsomething" />').attrib['value']
print(value)
# => 'something\n something'

elem = ET.fromstring('<element />')
elem.attrib['value'] = 'something\n something'
xml = ET.tostring(elem)
print(xml)
# => b'<element value="something&#10; something"/>'

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1