'Sanitizing XML text in Python (&)
I am writing a Python script that imports work items from IBM RTC and exports them to Microsoft ADS. One issue I found is that some strings from RTC xml data are imported with strange text characters such as &:
9.	Customize the rules for Feature work item
9. Customize the rules for Feature work item
1.	Send out the on-boarding form to Capsule Tech to understand the features/Tools/Customization used by them
1. Send out the on-boarding form to Capsule Tech to understand the features/Tools/Customization used by them
Speed Up RTC->ADS queries
Speed Up RTC->ADS queries
I've tried using the following code to sanitize and normalize the text:
from bs4 import BeautifulSoup
from html import unescape
soup = BeautifulSoup(unescape(rtc_title), 'lxml')
ads_title=soup.text
But it is replacing the characters with tabs most of the time, which is incorrect:
1.\tSend out the on-boarding form to Capsule Tech to understand the features/Tools/Customization used by them
is there a better way to parse and normalize these strings taken from IBM RTC xml data? Thanks
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
