'sparse matrix use in pycaret for nlp
First of all, thank you for allowing me to use this wonderful library I am doing Korean NLP now So I pre-processed Korean and converted it into a Tfidf Vectorizer. I'm going to put it in setup() and use it, but there are errors
Can't I use the sparse matrix for the pycaret? If so, is there any way to do Korean NLP?
X_train = train_data.Text.tolist()
Y_train =train_data['Label'].values
X_test = test_data.Text.tolist()
Y_test =test_data['Label'].values
from soynlp.word import WordExtractor
word_extractor = WordExtractor(min_frequency=100,
min_cohesion_forward=0.05,
min_right_branching_entropy=0.0
)
word_extractor.train(X_train) # list of str or like
words = word_extractor.extract()
scores = word_extractor.word_scores()
import math
score_dict = {key: scores[key].cohesion_forward *
math.exp(scores[key].right_branching_entropy)
for key in scores}
from soynlp.tokenizer import LTokenizer
cohesion_score = {word:score.cohesion_forward for word, score in words.items()}
tokenizer = LTokenizer(scores=score_dict)
import os
from scipy.sparse import save_npz, load_npz
from sklearn.feature_extraction.text import TfidfVectorizer
# if not os.path.isfile('soy_train.npz'):
tfidf = TfidfVectorizer(ngram_range=(1, 2),
min_df=3,
tokenizer=tokenizer.tokenize,
token_pattern=None)
tfidf.fit(X_train)
X_train_soy = tfidf.transform(X_train)
X_test_soy = tfidf.transform(X_test)
save_npz('soy_train.npz', X_train_soy)
save_npz('soy_test.npz', X_test_soy)
type(X_train_soy[0])
import numpy as np
train = pd.DataFrame(X_train_soy)
y_train = np.array(Y_train,dtype=float)
train['Label'] = y_train
from pycaret.classification import *
import numpy as np
exp1 = setup(train,train_size=0.8, target = 'Label',use_gpu = True)
TypeError: Cannot compare types 'ndarray(dtype=object)' and 'float'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
