'Multiple trainings / multiple NN initialisations per Hyperparamter validation with Optuna and pruning
I am just doing my first ML-with-optuna project. My question is how can I probe one set of hyperparamters for multiple NN initialization, where each run within one trial is still subject to pruning?
I am assuming that the initialization has quite some influence and I don't want to strike out good HP due to bad luck.
As far as I know each trial represents one set of HP. So if I want to eval them for multiple initialization I perform multiple trainings per trial. But within one trial I can only report one value for each timestamp.
Do I have to implement this without optuna? Should I go for an approach which lets optuna first suggest a set of HP and then fixes it for the next trials? Or do you know some good approach to achieve this with optuna?
Many thanks in advance!
Edit 1; Adding a minimal code example:
from random import randrange
import optuna
def objective(trial):
"""
return x * 20 + random_offset
multiplication calculated iteratively to enable pruning
"""
x = trial.suggest_float("x", 0, 10)
random_offset = randrange(0, 1000)
temp = 1
res_temp = None
for i in range(20):
temp += x
res_temp = temp + random_offset
trial.report(res_temp, i)
if trial.should_prune():
raise optuna.TrialPruned()
return res_temp
if __name__ == '__main__':
study = optuna.create_study(pruner=optuna.pruners.MedianPruner())
study.optimize(objective, n_trials=20)
print("best params:", study.best_params)
print("best value:", study.best_value)
This example tries to find the "x" in a range of 0 to 10 which minimizes "x * 20". The obvious answer is 0. The objective function is calculating the result based on iterative summation; which uses pruning. Sadly the objective function is noisy due to the random offset. This is meant as a metaphor for training a NN. The iteration is the training loop, x is the hyperparamter and the offset is the random initialization of the network.
The problem which is caused by the noise is that you can't determine the quality of a hyperparamter for sure as the result might be dominated by the random offset. This might lead to selecting a sub-optimal x. If I am right, than increasing the number of trials, to smooth out the randomness, might not work as optuna might suggest new hyperparamters based on the old ones. So unlucky observations will hinder the the further progress.
So I assumed it would be best to evaluate the objective several times for the same set of hyperpramters and only remember the best "run".
So my question is how to best smooth out the noise? Is my assumption correct that increasing the number of trials only is not the best approach and how would you implement the repeated evaluation?
Solution 1:[1]
Since your objective is now also dependent on randomness it is best to evaluate the objective several times as what you have assumed.
But even better try to identify where the randomness came from, is it from the seed number? If not then you really need more trials and more evaluation of complete epoch.
It would look something like this from the optuna example. Each epoch or step, the model is evaluated n_train_iter times for the same parameter.
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import train_test_split
import optuna
X, y = load_iris(return_X_y=True)
X_train, X_valid, y_train, y_valid = train_test_split(X, y)
classes = np.unique(y)
def objective(trial):
alpha = trial.suggest_float("alpha", 0.0, 1.0)
clf = SGDClassifier(alpha=alpha)
n_train_iter = 100
for step in range(n_train_iter):
clf.partial_fit(X_train, y_train, classes=classes)
intermediate_value = clf.score(X_valid, y_valid)
trial.report(intermediate_value, step)
if trial.should_prune():
raise optuna.TrialPruned()
return clf.score(X_valid, y_valid)
study = optuna.create_study(
direction="maximize",
pruner=optuna.pruners.MedianPruner(
n_startup_trials=5, n_warmup_steps=30, interval_steps=10
),
)
study.optimize(objective, n_trials=20)
You can go further by calling
X_train, X_valid, y_train, y_valid = train_test_split(X, y)
multiple times just to find the the best objective value.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ferdy |
