'Sanitizing XML text in Python (&amp)

I am writing a Python script that imports work items from IBM RTC and exports them to Microsoft ADS. One issue I found is that some strings from RTC xml data are imported with strange text characters such as &amp:

9.&amp;#09;Customize the rules for Feature work item
9. Customize the rules for Feature work item

1.&#09;Send out the on-boarding form to Capsule Tech to understand the features/Tools/Customization used by them
1. Send out the on-boarding form to Capsule Tech to understand the features/Tools/Customization used by them

Speed Up RTC-&gt;ADS queries
Speed Up RTC->ADS queries

I've tried using the following code to sanitize and normalize the text:

from bs4 import BeautifulSoup
from html import unescape


    soup = BeautifulSoup(unescape(rtc_title), 'lxml')
    ads_title=soup.text

But it is replacing the characters with tabs most of the time, which is incorrect:

1.\tSend out the on-boarding form to Capsule Tech to understand the features/Tools/Customization used by them

is there a better way to parse and normalize these strings taken from IBM RTC xml data? Thanks

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Sanitizing XML text in Python (&amp)

Sources

Related Questions