'use vectorizer after train test split to vectorize only one column in X
If I have multiple columns in a dataframe and I would like to vectorize one of the columns, how do I do that?
This is my x and y for the train test split
X = df[['tweet_text', 'Subjectivity', 'polarity']]
y = df ['cyberbullying_type']
Tweet_text in X is the column needed to be vectorized. Subjectivity, polarity is already numbers.
i split the dataframe into the train/test set:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.1, random_state= 42)
then here I am trying to vectorize X_train with tfidfVectorizer:
tfidf = tfidfVectorizer(max_features= 5000)
X_train_tfidf = tfidf.fit_transform(X_train)
But this does not seem to work, it just take the three column names and vectorize them. ('tweet_text', 'Subjectivity', 'polarity' )
if I say:
X_train_tfidf = tfidf.fit_transform(X_train.tweet_text)
X_train_tfidf only becomes the vectorized tweet_text and the columns 'Subjectivity', 'polarity' I no longer a part of the set.
Please help me. Thank You
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
