'Q learning - epsilon greedy update

I am trying to understand the epsilon - greedy method in DQN. I am learning from the code available in https://github.com/karpathy/convnetjs/blob/master/build/deepqlearn.js

Following is the update rule for epsilon which changes with age as below:

$this.epsilon = Math.min(1.0, Math.max(this.epsilon_min, 1.0-(this.age - this.learning_steps_burnin)/(this.learning_steps_total - this.learning_steps_burnin)));

Does this mean the epsilon value starts with min (chosen by user) and then increase with age reaching upto burnin steps and eventually becoming to 1? Or Does the epsilon start around 1 and then decays to epsilon_min ?

Either way, then the learning almost stops after this process. So, do we need to choose the learning_steps_burnin and learning_steps_total carefully enough? Any thoughts on what value needs to be chosen?

Solution 1:^[1]

Since epsilon denotes the amount of randomness in your policy (action is greedy with probability 1-epsilon and random with probability epsilon), you want to start with a fairly randomized policy and later slowly move towards a deterministic policy. Therefore, you usually start with a large epsilon (like 0.9, or 1.0 in your code) and decay it to a small value (like 0.1). Most common and simple approaches are linear decay and exponential decay. Usually, you have an idea of how many learning steps you will perform (what in your code is called learning_steps_total) and tune the decay factor (your learning_steps_burnin) such that in this interval epsilon goes from 0.9 to 0.1.

Your code is an example of linear decay. An example of exponential decay is

epsilon = 0.9
decay = 0.9999
min_epsilon = 0.1
for i from 1 to n
    epsilon = max(min_epsilon, epsilon*decay)

Solution 2:^[2]

Personally I recommend an epsilon decay such that after about 50/75% of the training you reach the minimum value of espilon (advice from 0.05 to 0.0025) from which then you have only the improvement of the policy itself. I created a specific script to set the various parameters and it returns after what the decay stop is reached (at the indicated value)

import matplotlib.pyplot as plt
import numpy as np

eps_start = 1.0
eps_min = 0.05
eps_decay = 0.9994
epochs = 10000
pct = 0
df = np.zeros(epochs)
for i in range(epochs):
    if i == 0:
        df[i] = eps_start
    else:
        df[i] = df[i-1] * eps_decay
        if df[i] <= eps_min:
            print(i)
            stop = i
            break

print("With this parameter you will stop epsilon decay after {}% of training".format(stop/epochs*100))
plt.plot(df)
plt.show()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Simon
Solution 2	Cristian

'Q learning - epsilon greedy update

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]