'Frame stacking/observation stacking in reinforcement learning: what happens until I have reached the stack size

I am currently working on a reinforcement learning project and I used observation stacking for the first time. I implemented the custom environemnt with OpenAI Gym and implemented my agents (DQN and PPO) in StableBaselines3. I implemented observation stacking with a stack size of 10 and the code is working and the results are great. However I am not completely understanding what my code is doing at the very beginning before sufficient observations have been stacked. In short: what is the agent doing the first 9 steps? Does the agent still get to chose actions and if, how are the incomplete observations (not yet fully stacked) fed into the Neural Network underlying the agents decissions? I am explicetly not using frame skipping as implemented in the original DQN paper from 2015. Relevant parts of my implementation:

class SomeEnv(gym.Env):
  def __init__(self, arg1, ...):
    ...
  def step(self, action):
    ...
    return obs, rew, done, info 
  def reset(self):
    ...  
  def render(self, mode='human', close=False):
    ...

train_env = VecFrameStack(make_vec_env(SomeEnv,
                                  vec_env_cls=DummyVecEnv,
                                  env_kwargs={...}),
                                  n_stack=10)
model = DQN(Mlp_policy, 
            train_env,
            ...)
model.learn(...)

Glad for any explanations!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source