'Multiple workers not handling concurrent requests

I'm using FastApi hosted in a EC2 AWS instance.

EC2 Instance specs(g4dn.xlarge): 16 GB memory, 4 CPU, GPU: NVIDIA T4

I'm running a stress test to the app. If I send 10 simultaneous POST requests to the app, only 5 gets processed simultaneously (at the exact same time) whereas the others get processed one by one in different times (after few seconds).

The configuration is the following

gunicorn main:app --workers 9 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:80

I've set the workers to 9 given this formula The suggested maximum concurrent requests when using workers and threads is (2*CPU)+1.

Given the above configuration I'm expecting to get all the 10 requests handled at the exact same time.

With 9 workers I'm also getting CUDA out of memory error, even though I clean the cache.

With 3 workers I don't run into the CUDA problem, but the requests handled at the exact same time are only 3, no more than that. The remaining ones get processed later, after few seconds, one by one.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Multiple workers not handling concurrent requests

Sources

Related Questions