'Frame stacking/observation stacking in reinforcement learning: what happens until I have reached the stack size
I am currently working on a reinforcement learning project and I used observation stacking for the first time. I implemented the custom environemnt with OpenAI Gym and implemented my agents (DQN and PPO) in StableBaselines3. I implemented observation stacking with a stack size of 10 and the code is working and the results are great. However I am not completely understanding what my code is doing at the very beginning before sufficient observations have been stacked. In short: what is the agent doing the first 9 steps? Does the agent still get to chose actions and if, how are the incomplete observations (not yet fully stacked) fed into the Neural Network underlying the agents decissions? I am explicetly not using frame skipping as implemented in the original DQN paper from 2015. Relevant parts of my implementation:
class SomeEnv(gym.Env):
def __init__(self, arg1, ...):
...
def step(self, action):
...
return obs, rew, done, info
def reset(self):
...
def render(self, mode='human', close=False):
...
train_env = VecFrameStack(make_vec_env(SomeEnv,
vec_env_cls=DummyVecEnv,
env_kwargs={...}),
n_stack=10)
model = DQN(Mlp_policy,
train_env,
...)
model.learn(...)
Glad for any explanations!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
