'AttributeError: 'spacy.tokenizer.Tokenizer' object has no attribute 'tokens_from_list'
Sorry,I don't know howt fix this error.
'spacy.tokenizer.Tokenizer' object has no attribute 'tokens_from_list'
Error of code is under.
import spacy
import re
regexp = re.compile('(?u)\\b\\w\\w+\\b')
en_nlp = spacy.load("en_core_web_sm", disable=['parser', 'ner'])
old_tokenizer = en_nlp.tokenizer
en_nlp.tokenizer = lambda string: old_tokenizer.tokens_from_list(
regexp.findall(string))
def custom_tokenizer(document):
doc_spacy = en_nlp(document)
return [token.lemma_ for token in doc_spacy]
lemma_vect = CountVectorizer(tokenizer=custom_tokenizer, min_df=5)
X_train_lemma = lemma_vect.fit_transform(text_train)
print("X_train_lemma.shape: {}".format(X_train_lemma.shape))
vect = CountVectorizer(min_df=5).fit(text_train)
X_train = vect.transform(text_train)
print("X_train.shape: {}".format(X_train.shape))
Please help me, a lot of time wasted to solve this error
Solution 1:[1]
Am I seeing it correctly that you are using SpaCy to tokenize while also overwriting its tokenizer with a custom tokenizer? And then you throw away everything except the tokenization?
If that is the case then you shouldn't be using SpaCy but instead just split by the pattern yourself. I would take a look at the pattern though and see if this is really what you want.
import re
pattern = re.compile('(?u)\\b\\w\\w+\\b')
# print the substrings that match
print(pattern.findall("Sorry, I don't know how to fix this error."))
> ['Sorry', 'don', 'know', 'how', 'to', 'fix', 'this', 'error']
# print the substrings between matches
print(pattern.split("Sorry, I don't know how to fix this error."))
> ['', ', I ', "'t ", ' ', ' ', ' ', ' ', ' ', '.']
Generally if all you need is tokenization I would not recommend using SpaCy as it is quite slow and does a lot more than just the tokenization.
One alternative to SpaCy is NLTK.
import nltk
sentence = "Sorry, I don't know how to fix this error."
tokens = nltk.word_tokenize(sentence)
print(tokens)
> ['Sorry', ',', 'I', 'do', "n't", 'know', 'how', 'to', 'fix', 'this', 'error', '.']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
