'Leave out some words from lemmatisation in Python

While doing lemmatisation i've noticed it is not working properly on some words. That is why i am thinking how to leave them out from the lemmatisation and keep their original form.

I've created a "nolemmawords" list with such words and wonder how to do it.

def find_words(text, regex = words_regex):
    tokens =  regex.findall(text.lower())
    return [w for w in tokens if w.isalpha() and len(w) > 2]

mystem = Mystem()
def lemmatize(words, lemmer = mystem, stopwords = stopwords_list):
     lemmas = lemmer.lemmatize(' '.join(words))
     return [w for w in lemmas if not w in stopwords
             and w.isalpha()]

def preprocess(text):
    return (lemmatize(find_words(text)))

I've tried this, but it completely eliminates such words, which is not my goal. Could you please help?

def find_words(text, regex = words_regex):
    tokens =  regex.findall(text.lower())
    return [w for w in tokens if w.isalpha() and not w in nolemmawords and len(w) > 2]


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source