'Will actor-critic algorithms like DDPG and TD3 avoid the risks of exploration?

When learning Q-learning and Expected Sarsa, I knew that one of the differences is Expected Sarsa will avoid the risks of exploration in CliffWalking Environment and choose the safest path while Q-learning will choose the optimal path.

Picture of Safe Path and Optimal Path in CliffWalking

To my understanding, this is because the Expected Sarsa takes the explore-then-fall condition in to the Q update equation, which decreased the estimated Q value along the optimal path.

So how will DDPG and TD3 algorithm behave in CliffWalking? How to analyze the consequences of exploration based on update rules?

reinforcement-learning

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Will actor-critic algorithms like DDPG and TD3 avoid the risks of exploration?

Sources

Related Questions