'removal of repeated <span style> tags in rich text

I have a lot of text documents in different languages saved in MS Word that I need to import into a web site for display as rich text. Normally this is quite easy, but some of the documents have strange internal formatting like so:

<span style="font-family:SimSun"><span style="color:white">得</span></span>
<span style="font-family:SimSun"><span style="color:white">到</span></span>
<span style="font-family:SimSun"><span style="color:white">一本</span></span>
<span style="font-family:SimSun"><span style="color:white">慕迪</span></span>
<span style="font-family:SimSun"><span style="color:white">醫生所寫</span></span>

every few characters have their own definition that is identical to the ones near it. This instead should be codified as

<span style="font-family:SimSun"><span style="color:white">得到一本慕迪醫生所寫</span></span>

This goes on for hundreds of pages and causes a lot of problems when displayed on a webpage such as load time issues, issues with database trying to process much bigger requests, etc. Is there anything I can do in MS Word or some other program to consolidate the text by pruning the unnecessary tags?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source