'How to remove duplicate substring from text without removing duplicate punctuations?

Here's an example string I have,

string = '111 East Sego Lily Drive Lily Drive, Suite 200 Sandy, UT 84070'

Here "Lily Drive" is repeated twice and I want to remove that duplication. But if you see the punctuation "," is also repeated twice but I don't want to remove that.

string = nltk.word_tokenize(string)
string = OrderedDict().fromkeys(string)
string = " ".join(string)

This returns,

'111 East Sego Lily Drive, Suite 200 Sandy UT 84070'

What I am looking for,

'111 East Sego Lily Drive, Suite 200 Sandy, UT 84070'

python python-3.x

Solution 1:^[1]

Instead of the OrderedDict you could do a little workaround to prevent from removing duplicate , or anything you define. Like this:

import nltk.tokenize as nltk

string = '111 East Sego Lily Drive Lily Drive, Suite 200 Sandy, UT 84070'
s = nltk.word_tokenize(string)

uniques = set()
res = []
for word in s:
    if word not in uniques or word==',':
        uniques.add(word)
        res.append(word)
        
out = ' '.join(res).replace(' ,', ',')
print(out)

111 East Sego Lily Drive, Suite 200 Sandy, UT 84070

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'How to remove duplicate substring from text without removing duplicate punctuations?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]