'How to choose LinearSVC instead of SVC if kernel=linear in param_grid?

I have the following way to create the grid_cv_object. Where hyperpam_grid={"C":c, "kernel":kernel, "gamma":gamma, "degree":degree}.

grid_cv_object = GridSearchCV(
        estimator = SVC(cache_size=cache_size),
        param_grid = hyperpam_grid,
        cv = cv_splits,
        scoring = make_scorer(matthews_corrcoef), # a callable returning single value, binary and multiclass labels are supported
        n_jobs = -1, # use all processors
        verbose = 10,
        refit = refit
    )

Here kernel can be ('rbf', 'linear', 'poly') for example.

How can I enforce the selection of LinearSVC for the 'linear' kernel? Since this is embedded in hyperparam_grid I'm not sure how to create this sort of "switch".

I just don't want to have 2 separate grid_cv_objects if possible.



Solution 1:[1]

Try making parameter grids in the following form

from sklearn.dummy import DummyClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline

search_spaces = [
    {'svm': [SVC(kernel='rbf')],
     'svm__gamma': ('scale', 'auto'),
     'svm__C': (0.1, 1.0, 10.0)},
    {'svm': [SVC(kernel='poly')],
     'svm__degree': (2, 3),
     'svm__C': (0.1, 1.0, 10.0)},
    {'svm': [LinearSVC()],  # Linear kernel
     'svm__C': (0.1, 1.0, 10.)}
]
svm_pipe = Pipeline([('svm', DummyClassifier())])
grid = GridSearchCV(svm_pipe, search_spaces)

Discussion:

  1. We separate different kernels with different instances of SVC. This way, GridSearchCV will not estimate, say, SVC(kernel='poly') with different gammas, which are ignored for 'poly' and are designated only for rbf.

  2. As you request, LinearSVC (and in fact any other model), not SVC(kernel='linear'), is separated to estimate a linear svm.

  3. Best estimator will be grid.best_estimator_.named_steps['svm'].

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1