'ReinforcementLearning Julia PPOpolicy - continuous action space with normal distribution
I am using PPO policy for a continous action space environment and I was wondering if there is a way to access the values of μ and logσ that the gaussian network approximator outputs .
agent = Agent(
policy = PPOPolicy(
approximator = ActorCritic(
actor = GaussianNetwork(
pre = Chain(
Dense(ns, 64, relu; init = glorot_uniform(rng)),
Dense(64, 64, relu; init = glorot_uniform(rng)),
),
μ = Chain(Dense(64, 1, tanh; init = glorot_uniform(rng)), vec),
logσ = Chain(Dense(64, 1; init = glorot_uniform(rng)), vec),
),
critic = Chain(
Dense(ns, 64, relu; init = glorot_uniform(rng)),
Dense(64, 64, relu; init = glorot_uniform(rng)),
Dense(64, 1; init = glorot_uniform(rng)),
),
optimizer = ADAM(3e-4),
) |> cpu,
γ = 0.99f0,
λ = 0.95f0,
clip_range = 0.2f0,
max_grad_norm = 0.5f0,
n_epochs = 50,
n_microbatches = 32,
actor_loss_weight = 1.0f0,
critic_loss_weight = 0.5f0,
entropy_loss_weight = 0.00f0,
dist = Normal,
rng = rng,
update_freq = UPDATE_FREQ,
),
I tried typing agent.policy.approximator.actor.logσ but that returns the chain of the NN
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
