'Why is the Stable-Baselines3 evaluate_policy() function never finishing/completing?
I have created my own custom environment using OpenAI Gym and Stable-Baselines3. Once I've trained the agent, I try to evaluate the policy using the evaluate_policy() function from stable_baselines3.common.evaluation. However, the script runs indefinitely and never finishes.
As it never finishes, I have been trying to debug the 'done' variable within my CustomEnv() environment, to make sure that the environment always ends one way or another. Other than that I am at a complete loss.
The code that I am using is below (for brevity it doesn't include the environment code):
env = CustomEnv()
env = Monitor(env, log_dir)
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log = log_dir)
timesteps = 5000
for i in range(3):
model.learn(total_timesteps = timesteps, reset_num_timesteps = False, tb_log_name = "PPO")
model.save(f"{models_dir}/car_model_{timesteps * i}")
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=1)
Any suggestions or advice on how to debug this would be amazing.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
