'AttributeError: 'list' object has no attribute 'lower' in TF IDF modeling
can anyone please help me move forward in my modeling, I have no idea where is that .lower attribute I have called upon and how to fix it.. appreciate any help
HERE IS THE ONLY PART WHERE I APPLIED .LOWER wordnet_lemmatizer = WordNetLemmatizer()wordnet_lemmatizer = WordNetLemmatizer() def create_tokens(df2): df2['low'] = df2['Movie'].str.lower() df2['stopwords_out'] = df2['low'].apply(lambda x: " ".join([word for word in x.split()if word not in stops])) df2['tokenized'] = df2.apply(lambda row: nltk.word_tokenize(row['stopwords_out']), axis=1) df2['eng_only'] = df2['tokenized'].apply(lambda x: [word for word in x if word.isalpha()]) df2['lemmatized'] = df2['eng_only'].apply(lambda x: [wordnet_lemmatizer.lemmatize(word) for word in x])
HERE IS WHEN I HAVE CHANGED MY LEMMATIZED COLUMN TO LIST
a = df2.lemmatized.to_list()
b = (list(itertools.chain.from_iterable(a)))
bow = Counter (b)
HERE IS WHEN I TRY TO CREATE TF IDF AND WHERE THE ERROR APPEARS
cv = CountVectorizer(min_df=0, max_df=1)
tf = cv.fit_transform(df2.lemmatized)
THE ERROR AttributeError Traceback (most recent call last) C:\AppData\Local\Temp/ipykernel_24552/1530549768.py in 2 3 cv = CountVectorizer(min_df=0, max_df=1) ----> 4 tf = cv.fit_transform(df2.lemmatized) 5 6 print(df2.lemmatized)
~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y) 1200 max_features = self.max_features 1201 -> 1202 vocabulary, X = self.count_vocab(raw_documents, 1203 self.fixed_vocabulary) 1204
~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in _count_vocab(self, raw_documents, fixed_vocab) 1112 for doc in raw_documents: 1113 feature_counter = {} -> 1114 for feature in analyze(doc): 1115 try: 1116 feature_idx = vocabulary[feature]
~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in _analyze(doc, analyzer, tokenizer, ngrams, preprocessor, decoder, stop_words) 102 else: 103 if preprocessor is not None: --> 104 doc = preprocessor(doc) 105 if tokenizer is not None: 106 doc = tokenizer(doc)
~\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in _preprocess(doc, accent_function, lower) 67 """ 68 if lower: ---> 69 doc = doc.lower() 70 if accent_function is not None: 71 doc = accent_function(doc)
AttributeError: 'list' object has no attribute 'lower'
print(df2.lemmatized)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
