'Best stemming algorithm in NLTK, Python
I am trying to stem the word tokens I get after tokenizing the data using PorterStemmer but am getting incorrect results. Which stemming algorithm would be the best one to go with?
Code-
from nltk.stem import PorterStemmer
porter = PorterStemmer()
porter.stem("mobile")
Code Output-
mobil
Expected Output-
mobile
Solution 1:[1]
You might be looking for lemmatization and not stemming. Check out https://www.guru99.com/stemming-lemmatization-python-nltk.html.
Stemming means the reduction to the root/base of the word. Lemmatization means the reduction to the non-flectional base form (e.g. infinitive for verbs).
The root of "mobile" is "mobil" because of words like "mobility". The unchanged root/base does in this case not include the e.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ewz93 |
