'How to use continuous values in the action space of a gym environment?
I am trying to make a custom gym environment with five actions, all of which can have continuous values. To implement the same, I have used the following action_space format:
self.action_space = spaces.Tuple((spaces.Box(low=np.array([0]),high=np.array([1]), dtype=np.float32),
spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
spaces.Box(low=np.array([-2]), high=np.array([2]),dtype=np.float32),
spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
spaces.Box(low=np.array([1]), high=np.array([20]),dtype=np.int8)))
However, when I try to run a PPO model(from stable_baselines3), I get the following error:
AssertionError: The algorithm only supports (<class 'gym.spaces.box.Box'>, <class 'gym.spaces.discrete.Discrete'>, <class 'gym.spaces.multi_discrete.MultiDiscrete'>, <class 'gym.spaces.multi_binary.MultiBinary'>) as action spaces but Tuple(Box(0.0, 1.0, (1,), float32), Box(0.0, 1.0, (1,), float32), Box(-2.0, 2.0, (1,), float32), Box(0.0, 1.0, (1,), float32), Box(1, 20, (1,), int8)) was provided
I searched for a bit about this issue and I found this on Github:
Link According to this I changed my code in the following way:
self.action_space = {"Temperature": spaces.Box(low=np.array([0]),high=np.array([1]), dtype=np.float32),
"topP": spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
"frequencyPenalty": spaces.Box(low=np.array([-2]), high=np.array([2]),dtype=np.float32),
"presencePenalty": spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
"bestOf": spaces.Box(low=np.array([1]), high=np.array([20]),dtype=np.int8)}
But this still returned the same error.
Also, I found this answer: Link
According to this, my code should work as I am using the Tuple space too.
How do I convert this to an accepted data type for the action_space?
Solution 1:[1]
Unfortunately most of the stable-baselines3 implementation only support Box, Discrete, MultiDiscrete and MultiBinary action spaces (see stable-baselines3 Implemented Algorithms).
The link you posted referred to openai, and not stable-baselines3.
You should look into other frameworks and check if their algorithm implementations support Tuples / Dictionaries, or otherwise try to implement your own!
Otherwise you could try to check if your action spaces with multiple Box-type actions can be easily converted into Discrete-type actions! (which is supported in stable-baselines3 through MultiDiscrete)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Raoul Raftopoulos |
