'Will actor-critic algorithms like DDPG and TD3 avoid the risks of exploration?

When learning Q-learning and Expected Sarsa, I knew that one of the differences is Expected Sarsa will avoid the risks of exploration in CliffWalking Environment and choose the safest path while Q-learning will choose the optimal path.

Picture of Safe Path and Optimal Path in CliffWalking

To my understanding, this is because the Expected Sarsa takes the explore-then-fall condition in to the Q update equation, which decreased the estimated Q value along the optimal path.

So how will DDPG and TD3 algorithm behave in CliffWalking? How to analyze the consequences of exploration based on update rules?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source