'use vectorizer after train test split to vectorize only one column in X

If I have multiple columns in a dataframe and I would like to vectorize one of the columns, how do I do that?

This is my x and y for the train test split

X = df[['tweet_text', 'Subjectivity', 'polarity']]

y = df ['cyberbullying_type']

Tweet_text in X is the column needed to be vectorized. Subjectivity, polarity is already numbers.

i split the dataframe into the train/test set:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.1, random_state= 42)

then here I am trying to vectorize X_train with tfidfVectorizer:

tfidf = tfidfVectorizer(max_features= 5000)

X_train_tfidf = tfidf.fit_transform(X_train)

But this does not seem to work, it just take the three column names and vectorize them. ('tweet_text', 'Subjectivity', 'polarity' )

if I say:

X_train_tfidf = tfidf.fit_transform(X_train.tweet_text)

X_train_tfidf only becomes the vectorized tweet_text and the columns 'Subjectivity', 'polarity' I no longer a part of the set.

Please help me. Thank You



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source