'What to do when I have a massive action space in an Actor-Critic network?

I am trying to implement an actor-critic network for a bandwidth allotment problem. Here is how the environment works. The following is what happens at every episode, and the game ends when the returned state contains 0 users.

The environment returns the initial state, which contains the number of users.
The actor-critic network gives each user some bandwidth - a continuous value between 0 and 1. To make this problem fully discrete, I am representing the continuous bandwidth values as indices between 0 and 99. For example, if index 5 is chosen as the action, the allotted bandwidth is max_bandwidth * 0.05.
I represent these actions this way: User1Index_User2Index_...._UserNIndex. For example, if there are 5 users, the action space would be in the range [0, 999999].
The environment takes in the allotted bandwidth for each user as the input, makes some calculations, and returns the state for the next episode.

The action space increases exponentially as the number of users increases, and I quickly run out of memory on my device. Is there a better way to approach this problem?

Any help or suggestion would be appreciated.

Thanks!

tensorflow reinforcement-learning

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'What to do when I have a massive action space in an Actor-Critic network?

Sources

Related Questions