'Why is the Stable-Baselines3 evaluate_policy() function never finishing/completing?

I have created my own custom environment using OpenAI Gym and Stable-Baselines3. Once I've trained the agent, I try to evaluate the policy using the evaluate_policy() function from stable_baselines3.common.evaluation. However, the script runs indefinitely and never finishes.

As it never finishes, I have been trying to debug the 'done' variable within my CustomEnv() environment, to make sure that the environment always ends one way or another. Other than that I am at a complete loss.

The code that I am using is below (for brevity it doesn't include the environment code):

env = CustomEnv()
env = Monitor(env, log_dir)
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log = log_dir)

timesteps = 5000
for i in range(3):
  model.learn(total_timesteps = timesteps, reset_num_timesteps = False, tb_log_name = "PPO")
  model.save(f"{models_dir}/car_model_{timesteps * i}")

mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=1)

Any suggestions or advice on how to debug this would be amazing.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Why is the Stable-Baselines3 evaluate_policy() function never finishing/completing?

Sources

Related Questions