'sparse matrix use in pycaret for nlp

First of all, thank you for allowing me to use this wonderful library I am doing Korean NLP now So I pre-processed Korean and converted it into a Tfidf Vectorizer. I'm going to put it in setup() and use it, but there are errors

Can't I use the sparse matrix for the pycaret? If so, is there any way to do Korean NLP?

X_train = train_data.Text.tolist()
Y_train =train_data['Label'].values
X_test = test_data.Text.tolist()
Y_test =test_data['Label'].values

from soynlp.word import WordExtractor

word_extractor = WordExtractor(min_frequency=100,
    min_cohesion_forward=0.05, 
    min_right_branching_entropy=0.0
)
word_extractor.train(X_train) # list of str or like
words = word_extractor.extract()
scores = word_extractor.word_scores()

import math

score_dict = {key: scores[key].cohesion_forward *
              math.exp(scores[key].right_branching_entropy) 
              for key in scores}

from soynlp.tokenizer import LTokenizer

cohesion_score = {word:score.cohesion_forward for word, score in words.items()}

tokenizer = LTokenizer(scores=score_dict)


import os
from scipy.sparse import save_npz, load_npz
from sklearn.feature_extraction.text import TfidfVectorizer

# if not os.path.isfile('soy_train.npz'):
tfidf = TfidfVectorizer(ngram_range=(1, 2),
                        min_df=3,
                        tokenizer=tokenizer.tokenize, 
                        token_pattern=None)
tfidf.fit(X_train)
X_train_soy = tfidf.transform(X_train)
X_test_soy = tfidf.transform(X_test)
save_npz('soy_train.npz', X_train_soy)
save_npz('soy_test.npz', X_test_soy)
type(X_train_soy[0])

import numpy as np
train = pd.DataFrame(X_train_soy)
y_train = np.array(Y_train,dtype=float)
train['Label'] = y_train


from pycaret.classification import *
import numpy as np
exp1 = setup(train,train_size=0.8, target = 'Label',use_gpu = True)

TypeError: Cannot compare types 'ndarray(dtype=object)' and 'float'



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source