'train svm with multiple features from multiple CountVectorizer
I have a dataset with multiple features and I am trying to build an svm model to classify new entries based on these features. To go about this, I chose to use CountVectorizer to convert the text data into numerical data for the training. I understand how to train a model with the features apart but I'm having difficulty understanding how to do so together.
Category Lyric Song_title
Rock Master of puppets pulling the strings Master of puppets
Rock Let the bodies hit the floor Bodies
Pop dreaming about the things we could be. Counting Stars
Pop Im glad you came Im glad you came NULL
[2000 rows x 3 columns]
To simplify certain steps. I decided to use built in functions to generate the data sets.
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import svm
from sklearn.linear_model import LogisticRegression
data = pd.read_excel('./music_data.xlsx',0)
train_data, test_data = train_test_split(data,test_size=0.53)
As both columns contain null values. I thought to separate the columns into 2 training sets and Train the models with the associated categories.
lyric_train = train_data[~pd.isnull(train_data['Lyric'])]
lyric_test = test_data[~pd.isnull(test_data['Lyric'])]
vectorizer_lyric = CountVectorizer(analyzer='word', ngram_range=(1, 5))
vc_lyric = vectorizer_lyric.fit_transform(lyric_train['Lyric'])
song_title_train = train_data[~pd.isnull(train_data['Song_title'])]
song_title_test = test_data[~pd.isnull(test_data['Song_title'])]
vectorizer_song = CountVectorizer(analyzer='word', ngram_range=(1, 5))
vc_song = vectorizer_song.fit_transform(song_title_train['Song_title'])
Then I build the models and try to combine them using a stacking classifier.
# Train for lyric feature
model_lyric = svm.SVC()
model_lyric.fit(vc_lyric, lyric_train['Category'])
features_test_lyric = vectorizer_lyric.transform(lyric_test['Lyric'])
model_lyric.score(features_test_lyric,lyric_test['Category']))
# train for Song Title feature
model_song = svm.SVC()
model_song.fit(vc_song, song_title_train['Category'])
features_test_song = vectorizer_song.transform(song_title_test['Song_title'])
model_song.score(features_test_song,song_title_test['Category']))
# Combine SVM models
estimators = [('lyric_svm',model_lyric),
('song_svm',model_song)]
stack_model = StackingClassifier(estimators=estimators,final_estimator=LogisticRegression())
From reading up online, this is not the correct way to do this as the StackingClassifier appears to combine multiple models using the same dataset & features. But I had separated the features for the CountVectorizer.
Solution 1:[1]
The error message:
no match for 'operator<' (operand types are 'const Person' and 'const Person')
This tells you that the lhs and rhs object of operator are const objects. and the compiler can not find an operator that will work two const Person objects.
If I look at your implementation:
bool operator < (const Person& p1) {
return (this->height > p1.height);
}
I see that the right-hand value p1 can be a const reference. But the left-hand value (the owner of the method) is being treated as non cost. So this implementation does not match the requirements needed.
But we know this operator is not changing the state of the object so we can simply mark this as a const member function.
bool operator < (const Person& p1) const {
// ^^^^^ Add the const here.
return (this->height > p1.height);
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Martin York |
