'How to use continuous values in the action space of a gym environment?

I am trying to make a custom gym environment with five actions, all of which can have continuous values. To implement the same, I have used the following action_space format:

self.action_space = spaces.Tuple((spaces.Box(low=np.array([0]),high=np.array([1]), dtype=np.float32),
                           spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           spaces.Box(low=np.array([-2]), high=np.array([2]),dtype=np.float32),
                           spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           spaces.Box(low=np.array([1]), high=np.array([20]),dtype=np.int8)))

However, when I try to run a PPO model(from stable_baselines3), I get the following error:

AssertionError: The algorithm only supports (<class 'gym.spaces.box.Box'>, <class 'gym.spaces.discrete.Discrete'>, <class 'gym.spaces.multi_discrete.MultiDiscrete'>, <class 'gym.spaces.multi_binary.MultiBinary'>) as action spaces but Tuple(Box(0.0, 1.0, (1,), float32), Box(0.0, 1.0, (1,), float32), Box(-2.0, 2.0, (1,), float32), Box(0.0, 1.0, (1,), float32), Box(1, 20, (1,), int8)) was provided

I searched for a bit about this issue and I found this on Github:

Link According to this I changed my code in the following way:

self.action_space = {"Temperature": spaces.Box(low=np.array([0]),high=np.array([1]), dtype=np.float32),
                           "topP": spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           "frequencyPenalty": spaces.Box(low=np.array([-2]), high=np.array([2]),dtype=np.float32),
                           "presencePenalty": spaces.Box(low=np.array([0]), high=np.array([1]),dtype=np.float32),
                           "bestOf": spaces.Box(low=np.array([1]), high=np.array([20]),dtype=np.int8)}

But this still returned the same error.

Also, I found this answer: Link

According to this, my code should work as I am using the Tuple space too.

How do I convert this to an accepted data type for the action_space?



Solution 1:[1]

Unfortunately most of the stable-baselines3 implementation only support Box, Discrete, MultiDiscrete and MultiBinary action spaces (see stable-baselines3 Implemented Algorithms).

The link you posted referred to openai, and not stable-baselines3.

You should look into other frameworks and check if their algorithm implementations support Tuples / Dictionaries, or otherwise try to implement your own!

Otherwise you could try to check if your action spaces with multiple Box-type actions can be easily converted into Discrete-type actions! (which is supported in stable-baselines3 through MultiDiscrete)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Raoul Raftopoulos