'train svm with multiple features from multiple CountVectorizer

I have a dataset with multiple features and I am trying to build an svm model to classify new entries based on these features. To go about this, I chose to use CountVectorizer to convert the text data into numerical data for the training. I understand how to train a model with the features apart but I'm having difficulty understanding how to do so together.

 Category           Lyric                                     Song_title
 Rock               Master of puppets pulling the strings     Master of puppets
 Rock               Let the bodies hit the floor              Bodies
 Pop                dreaming about the things we could be.    Counting Stars
 Pop                Im glad you came Im glad you came         NULL
 [2000 rows x 3 columns]

To simplify certain steps. I decided to use built in functions to generate the data sets.

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import svm
from sklearn.linear_model import LogisticRegression

data = pd.read_excel('./music_data.xlsx',0)
train_data, test_data = train_test_split(data,test_size=0.53)

As both columns contain null values. I thought to separate the columns into 2 training sets and Train the models with the associated categories.

lyric_train = train_data[~pd.isnull(train_data['Lyric'])]
lyric_test = test_data[~pd.isnull(test_data['Lyric'])]
vectorizer_lyric = CountVectorizer(analyzer='word', ngram_range=(1, 5))
vc_lyric = vectorizer_lyric.fit_transform(lyric_train['Lyric'])

song_title_train = train_data[~pd.isnull(train_data['Song_title'])]
song_title_test = test_data[~pd.isnull(test_data['Song_title'])]
vectorizer_song = CountVectorizer(analyzer='word', ngram_range=(1, 5))
vc_song = vectorizer_song.fit_transform(song_title_train['Song_title'])

Then I build the models and try to combine them using a stacking classifier.

# Train for lyric feature
model_lyric = svm.SVC()
model_lyric.fit(vc_lyric, lyric_train['Category'])
features_test_lyric = vectorizer_lyric.transform(lyric_test['Lyric'])
model_lyric.score(features_test_lyric,lyric_test['Category']))

# train for Song Title feature
model_song = svm.SVC()
model_song.fit(vc_song, song_title_train['Category'])
features_test_song = vectorizer_song.transform(song_title_test['Song_title'])
model_song.score(features_test_song,song_title_test['Category']))

# Combine SVM models
estimators = [('lyric_svm',model_lyric),
              ('song_svm',model_song)]

stack_model = StackingClassifier(estimators=estimators,final_estimator=LogisticRegression())

From reading up online, this is not the correct way to do this as the StackingClassifier appears to combine multiple models using the same dataset & features. But I had separated the features for the CountVectorizer.



Solution 1:[1]

The error message:

no match for 'operator<' (operand types are 'const Person' and 'const Person')

This tells you that the lhs and rhs object of operator are const objects. and the compiler can not find an operator that will work two const Person objects.

If I look at your implementation:

bool operator < (const Person& p1) {
    return (this->height > p1.height);
}

I see that the right-hand value p1 can be a const reference. But the left-hand value (the owner of the method) is being treated as non cost. So this implementation does not match the requirements needed.

But we know this operator is not changing the state of the object so we can simply mark this as a const member function.

bool operator < (const Person& p1) const {
                             //    ^^^^^    Add the const here.
    return (this->height > p1.height);
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Martin York