'Any way to remove symbols from a lemmatize word set using python
I got a lemmatize output from the below code with a output words consisting of " : , ? , !, ( )" symbols
output_H3 = [lemmatizer.lemmatize(w.lower(), pos=wordnet.VERB) for w in processed_H3_tag]
output :-
- ['hide()', 'show()', 'methods:', 'jquery', 'slide', 'elements:', 'launchedw3schools', 'today!']
Expected output :-
- ['hide', 'show', 'methods', 'jquery', 'slide', 'elements', 'launchedw3schools', 'today']
Solution 1:[1]
You could also use translate() and string.punctuation (!"#$%&'()*+,-./:;<=>?@[\]^_``{|}~):
trans = str.maketrans('', '', string.punctuation)
output_wo_punc = [s.translate(trans) for s in output]
Which returns:
> ['hide', 'show', 'methods', 'jquery', 'slide', 'elements', 'launchedw3schools', 'today']
Solution 2:[2]
Regular Expressions can help:
import re
output = [
"hide()",
"show()",
"methods:",
"jquery",
"slide",
"elements:",
"launchedw3schools",
"today!",
]
>>> import pprint
>>> expected = [re.sub(r'[:,?!()]', '', e) for e in output]
>>> pprint.pprint(expected)
['hide',
'show',
'methods',
'jquery',
'slide',
'elements',
'launchedw3schools',
'today']
This replaces anything in your list of non desired characters with nothing.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ewz93 |
| Solution 2 | Richard Dodson |
