'With Policy Gradients, what is the difference between Pi and Pi Theta? [closed]

I'm learning about Reinforcement Learning Policy Gradients.

What is the difference between Pi and Pi Theta?
I assume it never means 3.14.

What is the difference between Pi and Pi Theta?

Source for the page on the left side.

Source for the page on the right side.



Solution 1:[1]

They both represent policies. Keep in mind that the page of the left screenshot is about Key Concepts in RL, whereas on the rightside we are talking about Policy Optimization.

Therefore, in the left side they talk about generic policies, but they don't really care how do you find the policy, nor how do you optimize it. They just tell you that several policies exists, and one of them is the optimal one, which is the one that maximizes the expected cumulative reward.

On the other hand, in the right side they are trying to introduce policy optimization, by changing the policy parameters (which are represented by theta) using Stochastic Gradient Descent. Therefore, Pi Theta is just an example of policy, in which theta (which can be, for example, the parameters of a neural network) are used to extract the mapping between actions and states.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Raoul Raftopoulos