'Modify XML Custom Part Word Document Server Properties using XML Element Tree and or XML Minidom

I am reformatting and restructuring our document management to use Sharepoint. Our SOPs, Forms and Records were previously contained in SharePoint migrated to a major Document Management System and now need to be migrated back into Sharepoint. The other DMS utilized Document Variables to store key document information and previously this information was stored in custom XML Part "documentManagement" properties. I have already developed python scripts to modify the core_properties, extended_properties and custom_properties that exist. However, my attempt to use docx, aspose and xml.dom.minidom libraries has yet to provide a script to read or edit the XML Part "documentManagment" properties.

I have unzipped the word document and located the XML Part "documentManagment" properties in the \customXML\item1.xml, \customXML\item1.xml, \customXML\item3.xml and sometimes \customXML\item4.xml files. These files contain the schema, elements, and restrictions for these properties usually in the \customXML\item1.xml file and the property values usually stored in the \customXML\item2.xml. I have included here the item2.xml file for reference.

Item2.xml

<p:properties xmlns:p="http://schemas.microsoft.com/office/2006/metadata/properties" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:pc="http://schemas.microsoft.com/office/infopath/2007/PartnerControls">
  <documentManagement>
    <qetp xmlns="71220325-c405-4751-a4f1-a91992783649">
      <UserInfo>
        <DisplayName/>
        <AccountId xsi:nil="true" />
        <AccountType/>
      </UserInfo>
    </qetp>
    <IconOverlay xmlns="http://schemas.microsoft.com/sharepoint/v4" xsi:nil="true" />
    <Revision xmlns="8db16272-67aa-4515-a58c-977707b42560">1</Revision>
    <Review_x0020_Date xmlns="8db16272-67aa-4515-a58c-977707b42560">2019-04-10T07:00:00+00:00</Review_x0020_Date>
    <Site xmlns="8db16272-67aa-4515-a58c-977707b42560">
      <Value>Franklin</Value>
    </Site>
    <Status xmlns="8db16272-67aa-4515-a58c-977707b42560">Approved</Status>
    <Effective_x0020_Date xmlns="8db16272-67aa-4515-a58c-977707b42560">2017-04-10T07:00:00+00:00</Effective_x0020_Date>
    <Document_x0020_Number xmlns="8db16272-67aa-4515-a58c-977707b42560">EQIP-0033-00</Document_x0020_Number>
    <Module xmlns="8db16272-67aa-4515-a58c-977707b42560">4</Module>
  </documentManagement>
</p:properties>

Libraries such as docx and aspose.word have not been able to access these custom XML Part properties, even though they were used to access/edit the core, extended and custom properties. I am new to the xml.etree.ElementTree library and running into many failures. I hope someone might give me a starting point and direction.

I would like to change the value of Revision from 1 to a value of 5 I would like to change the value of Site from Franklin to Liverpool I would like to change the value of Document_x0020_Number from EQIP-0033-00 to GOV-0112 I would like to change the Value of Status from Approved to Effective

I mad progress using etree.ElementTree, but it has caused an problem I now need help with.

I used the following code to parse and edit the element text values in the tree. However, since the XML file was using namespaces, the parse resulted in the "tag" being {url}name instead of the tag being name xmlns={url}.

Code:

import xml.etree.ElementTree as ET
tree = ET.parse('D:\DTONAS01_DATAPART2_DriveE_Shares\Data\Quality\Veeva Export\XLSX_docProps\extracted\customXml\item2.xml')
root = tree.getroot()
print(root.tag, root.attrib, root.text)
for child in root:
    print(child.tag, child.attrib, child.text)
for grandchild in child:
    print(grandchild.tag, grandchild.attrib, grandchild.text)
    if grandchild.tag == '{8db16272-67aa-4515-a58c-977707b42560}Revision':
        grandchild.text = str(5)
        print(grandchild.tag, grandchild.attrib, grandchild.text)
    if grandchild.tag == '{8db16272-67aa-4515-a58c-977707b42560}Document_x0020_Number':
        grandchild.text = "GOV-0112"
        print(grandchild.tag, grandchild.attrib, grandchild.text)
    if grandchild.tag == '{8db16272-67aa-4515-a58c-977707b42560}Status':
        grandchild.text = "Effective"
        print(grandchild.tag, grandchild.attrib, grandchild.text)
    if grandchild.tag == '{8db16272-67aa-4515-a58c-977707b42560}Site':
        for subelement in grandchild:
            print(subelement.tag, subelement.attrib, subelement.text)
            subelement.text = "Liverpool, England"
        print(grandchild.tag, grandchild.attrib, grandchild.text, subelement.text)
tree.write('D:\DTONAS01_DATAPART2_DriveE_Shares\Data\Quality\Veeva Export\XLSX_docProps\extracted\customXml\item2.xml', encoding="utf-8")

<ns0:properties xmlns:ns0="http://schemas.microsoft.com/office/2006/metadata/properties" xmlns:ns1="71220325-c405-4751-a4f1-a91992783649" xmlns:ns3="http://schemas.microsoft.com/sharepoint/v4" xmlns:ns4="8db16272-67aa-4515-a58c-977707b42560" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <documentManagement>
    <ns1:qetp>
      <ns1:UserInfo>
        <ns1:DisplayName/>
        <ns1:AccountId xsi:nil="true" />
        <ns1:AccountType/>
      </ns1:UserInfo>
    </ns1:qetp>
    <ns3:IconOverlay xsi:nil="true" />
    <ns4:Revision>5</ns4:Revision>
    <ns4:Review_x0020_Date>2019-04-10T07:00:00+00:00</ns4:Review_x0020_Date>
    <ns4:Site>
      <ns4:Value>Liverpool, England</ns4:Value>
    </ns4:Site>
    <ns4:Status>Effective</ns4:Status>
    <ns4:Effective_x0020_Date>2017-04-10T07:00:00+00:00</ns4:Effective_x0020_Date>
    <ns4:Document_x0020_Number>GOV-0112</ns4:Document_x0020_Number>
    <ns4:Module>4</ns4:Module>
  </documentManagement>
</ns0:properties>

As a result the XML file I write back out looks very different than the original XML file. Unfortunately, Word does not like the new XML when zipped back together. Word is able to open the document, but Word no longer displays the properties in File\Info\Properties and shows an error in File\Info\Properties.

What do I need to do so that the output XML file looks the same as the input XML file regarding namespace notation?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source