'ReinforcementLearning Julia PPOpolicy - continuous action space with normal distribution

I am using PPO policy for a continous action space environment and I was wondering if there is a way to access the values of μ and logσ that the gaussian network approximator outputs .

agent = Agent(
  policy = PPOPolicy(
     approximator = ActorCritic(
        actor = GaussianNetwork(
            pre = Chain(
                Dense(ns, 64, relu; init = glorot_uniform(rng)),
                Dense(64, 64, relu; init = glorot_uniform(rng)),
            ),
            μ = Chain(Dense(64, 1, tanh; init = glorot_uniform(rng)), vec),
            logσ = Chain(Dense(64, 1; init = glorot_uniform(rng)), vec),
        ),
        critic = Chain(
            Dense(ns, 64, relu; init = glorot_uniform(rng)),
            Dense(64, 64, relu; init = glorot_uniform(rng)),
            Dense(64, 1; init = glorot_uniform(rng)),
        ),
        optimizer = ADAM(3e-4),
    ) |> cpu,
    γ = 0.99f0,
    λ = 0.95f0,
    clip_range = 0.2f0,
    max_grad_norm = 0.5f0,
    n_epochs = 50,
    n_microbatches = 32,
    actor_loss_weight = 1.0f0,
    critic_loss_weight = 0.5f0,
    entropy_loss_weight = 0.00f0,
    dist = Normal,
    rng = rng,
    update_freq = UPDATE_FREQ,
),

I tried typing agent.policy.approximator.actor.logσ but that returns the chain of the NN



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source