'Any way to remove symbols from a lemmatize word set using python

I got a lemmatize output from the below code with a output words consisting of " : , ? , !, ( )" symbols

output_H3 = [lemmatizer.lemmatize(w.lower(), pos=wordnet.VERB) for w in processed_H3_tag]

output :-

  • ['hide()', 'show()', 'methods:', 'jquery', 'slide', 'elements:', 'launchedw3schools', 'today!']

Expected output :-

  • ['hide', 'show', 'methods', 'jquery', 'slide', 'elements', 'launchedw3schools', 'today']


Solution 1:[1]

You could also use translate() and string.punctuation (!"#$%&'()*+,-./:;<=>?@[\]^_``{|}~):

trans = str.maketrans('', '', string.punctuation)   
output_wo_punc = [s.translate(trans) for s in output]

Which returns:

> ['hide', 'show', 'methods', 'jquery', 'slide', 'elements', 'launchedw3schools', 'today']

Solution 2:[2]

Regular Expressions can help:

import re 

output = [
    "hide()",
    "show()",
    "methods:",
    "jquery",
    "slide",
    "elements:",
    "launchedw3schools",
    "today!",
]


>>> import pprint
>>> expected = [re.sub(r'[:,?!()]', '', e) for e in output]
>>> pprint.pprint(expected)
['hide',
 'show',
 'methods',
 'jquery',
 'slide',
 'elements',
 'launchedw3schools',
 'today']

This replaces anything in your list of non desired characters with nothing.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ewz93
Solution 2 Richard Dodson