'Why am i getting this Unbound Local Error using a package which is wrapper around gridsearchCV? (hyperclassifersearch)

I am using the hyperclassifiersearch package to run my gridsearch with pipeline. One thing i do not understand is that when i use One Hot encoding ( when i switch to targetencoding i don't get the error) i get this error from running the below:

     86 
     87         print('Search is done.')
---> 88         return best_model # allows to predict with the best model overall
     89 
     90     def evaluate_model(self, sort_by='mean_test_score', show_timing_info=False):

UnboundLocalError: local variable 'best_model' referenced before assignment

The code i used to generate that error is as follows:

#define pipeline

    numeric_transformer = Pipeline(steps=[('imputer',SimpleImputer(missing_values=np.nan,strategy='constant', fill_value=0))]
    preprocessor = ColumnTransformer(transformers=[('num', numeric_transformer, numeric_cols),('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols)])
    model = XGBClassifier(objective='binary:logistic',n_jobs=-1_label_encoder=False)
    pipeline = Pipeline(steps=[('preprocessor', preprocessor),('clf', model)])


models = { 
                
    'xgb' : pipeline } 

params = { 
    'xgb': { 'clf__n_estimators': [200,300]}
        }
    

    cv = StratifiedKFold(n_splits=3, random_state=42,shuffle=True)


    search = HyperclassifierSearch(models, params)
    gridsearch = search.train_model(X_train, y_train, cv=cv,scoring='recall')

I dont understand this error? Can anybody help https://github.com/janhenner/HyperclassifierSearch <-- repo to package.



Solution 1:[1]

Full code is

def train_model(self, X_train, y_train, search='grid', **search_kwargs):
        """
        Optimizing over one or multiple classifiers or pipelines.
        Input:
        X : array or dataframe with features; this should be a training dataset
        y : array or dataframe with label(s); this should be a training dataset
        Output:
        returns the optimal model according to the scoring metric
        Parameters:
        search : str, default='grid'
            define the search
            ``grid`` performs GridSearchCV
            ``random`` performs RandomizedSearchCV
        **search_kwargs : kwargs
            additional parameters passed to the search
        """
        grid_results = {}
        best_score = 0

        for key in self.models.keys():
            print('Search for {}'.format(key), '...')
            assert search in ('grid', 'random'), 'search parameter out of range'
            if search=='grid':
                grid = GridSearchCV(self.models[key], self.params[key], **search_kwargs)
            if search=='random':
                grid = RandomizedSearchCV(self.models[key], self.params[key], **search_kwargs)
            grid.fit(X_train, y_train)
            self.grid_results[key] = grid

            if grid.best_score_ > best_score: # return best model
                best_score = grid.best_score_
                best_model = grid

        print('Search is done.')
        return best_model # allows to predict with the best model overall

but it seems in some situations all models may have score <= 0 and then it doesn't run

best_model = grid

so variable best_model is not created and it can't run return best_model.

You should assign default value - i.e. best_model = None - to have this value at start.

Or you should use lower score at start - i.e. best_score = -1


This problem you should send to author of this module.


EDIT:

I added best_model = None but now you have to remember to check if you not get None when you run it.

    """
    Optimizing over one or multiple classifiers or pipelines.
    Input:
    X : array or dataframe with features; this should be a training dataset
    y : array or dataframe with label(s); this should be a training dataset
    Output:
    returns the optimal model according to the scoring metric
    Parameters:
    search : str, default='grid'
        define the search
        ``grid`` performs GridSearchCV
        ``random`` performs RandomizedSearchCV
    **search_kwargs : kwargs
        additional parameters passed to the search
    """
    grid_results = {}
    best_score = 0

    best_model = None   # <--- default value at start

    for key in self.models.keys():
        print('Search for {}'.format(key), '...')
        assert search in ('grid', 'random'), 'search parameter out of range'
        if search=='grid':
            grid = GridSearchCV(self.models[key], self.params[key], **search_kwargs)
        if search=='random':
            grid = RandomizedSearchCV(self.models[key], self.params[key], **search_kwargs)
        grid.fit(X_train, y_train)
        self.grid_results[key] = grid

        if grid.best_score_ > best_score: # return best model
            best_score = grid.best_score_
            best_model = grid

    print('Search is done.')
    return best_model # allows to predict with the best model overall

And later

gridsearch = search.train_model(X_train, y_train, cv=cv,scoring='recall')

if not gridsearch:  # check if None
   print("Didn't find model")
else:
   # ... code ...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1