I am using tf-agents library to build a contextual bandit. For this I am building a custom environment. I am creating a banditpyenvironment and wrapping it in t
Scenario 1 My custom environment has the following _action_spec: self._action_spec = array_spec.BoundedArraySpec( shape=(highestIndex+1,), dtype=np.