'How to stop stable baseline model during the training exactly on at the end of frame?

I am training PPO2 model on stable-baseline library. I have tabular data with 15000 rows, thus length of the episodes is 15000. I am using nminibatches=4, n_envs=1. For example, I have set total_timesteps=10000. During the training process agent will see 15000 rows several times and updates actions for each rows, but in some particular point, the rest of the time total_timesteps will not be enough to see the full episode, and only part of episodes is available in the last step of learning. To be concrete. For simplicity, lets say we have 10 raws, 23 total_timesteps. The agent will see the full episode 2 times, and only the first 3 rows in the third times and rest of the 7 raws have not seen during last step.

I want to stop the learning process when Agent reaches the last time full episodes (above example stop learning at when total_timesteps=20) or define total_timesteps in such a way to see full episodes at the end of the training step.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source