'Why am i getting this Unbound Local Error using a package which is wrapper around gridsearchCV? (hyperclassifersearch)
I am using the hyperclassifiersearch package to run my gridsearch with pipeline. One thing i do not understand is that when i use One Hot encoding ( when i switch to targetencoding i don't get the error) i get this error from running the below:
86
87 print('Search is done.')
---> 88 return best_model # allows to predict with the best model overall
89
90 def evaluate_model(self, sort_by='mean_test_score', show_timing_info=False):
UnboundLocalError: local variable 'best_model' referenced before assignment
The code i used to generate that error is as follows:
#define pipeline
numeric_transformer = Pipeline(steps=[('imputer',SimpleImputer(missing_values=np.nan,strategy='constant', fill_value=0))]
preprocessor = ColumnTransformer(transformers=[('num', numeric_transformer, numeric_cols),('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols)])
model = XGBClassifier(objective='binary:logistic',n_jobs=-1_label_encoder=False)
pipeline = Pipeline(steps=[('preprocessor', preprocessor),('clf', model)])
models = {
'xgb' : pipeline }
params = {
'xgb': { 'clf__n_estimators': [200,300]}
}
cv = StratifiedKFold(n_splits=3, random_state=42,shuffle=True)
search = HyperclassifierSearch(models, params)
gridsearch = search.train_model(X_train, y_train, cv=cv,scoring='recall')
I dont understand this error? Can anybody help https://github.com/janhenner/HyperclassifierSearch <-- repo to package.
Solution 1:[1]
Full code is
def train_model(self, X_train, y_train, search='grid', **search_kwargs):
"""
Optimizing over one or multiple classifiers or pipelines.
Input:
X : array or dataframe with features; this should be a training dataset
y : array or dataframe with label(s); this should be a training dataset
Output:
returns the optimal model according to the scoring metric
Parameters:
search : str, default='grid'
define the search
``grid`` performs GridSearchCV
``random`` performs RandomizedSearchCV
**search_kwargs : kwargs
additional parameters passed to the search
"""
grid_results = {}
best_score = 0
for key in self.models.keys():
print('Search for {}'.format(key), '...')
assert search in ('grid', 'random'), 'search parameter out of range'
if search=='grid':
grid = GridSearchCV(self.models[key], self.params[key], **search_kwargs)
if search=='random':
grid = RandomizedSearchCV(self.models[key], self.params[key], **search_kwargs)
grid.fit(X_train, y_train)
self.grid_results[key] = grid
if grid.best_score_ > best_score: # return best model
best_score = grid.best_score_
best_model = grid
print('Search is done.')
return best_model # allows to predict with the best model overall
but it seems in some situations all models may have score <= 0 and then it doesn't run
best_model = grid
so variable best_model is not created and it can't run return best_model.
You should assign default value - i.e. best_model = None - to have this value at start.
Or you should use lower score at start - i.e. best_score = -1
This problem you should send to author of this module.
EDIT:
I added best_model = None but now you have to remember to check if you not get None when you run it.
"""
Optimizing over one or multiple classifiers or pipelines.
Input:
X : array or dataframe with features; this should be a training dataset
y : array or dataframe with label(s); this should be a training dataset
Output:
returns the optimal model according to the scoring metric
Parameters:
search : str, default='grid'
define the search
``grid`` performs GridSearchCV
``random`` performs RandomizedSearchCV
**search_kwargs : kwargs
additional parameters passed to the search
"""
grid_results = {}
best_score = 0
best_model = None # <--- default value at start
for key in self.models.keys():
print('Search for {}'.format(key), '...')
assert search in ('grid', 'random'), 'search parameter out of range'
if search=='grid':
grid = GridSearchCV(self.models[key], self.params[key], **search_kwargs)
if search=='random':
grid = RandomizedSearchCV(self.models[key], self.params[key], **search_kwargs)
grid.fit(X_train, y_train)
self.grid_results[key] = grid
if grid.best_score_ > best_score: # return best model
best_score = grid.best_score_
best_model = grid
print('Search is done.')
return best_model # allows to predict with the best model overall
And later
gridsearch = search.train_model(X_train, y_train, cv=cv,scoring='recall')
if not gridsearch: # check if None
print("Didn't find model")
else:
# ... code ...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
