'How many celery workers we can spawn on a machine?
Tech stack: celery 5.0.5, flask, python, Windows OS(8 CPUs).
To give a background, my usage requires spawning one worker, one queue per country as per the request payload

I am using celery.control.inspect().active() to see list of active workers and see if worker with {country}_worker exists in that list. If no, spawn a new worker using:
python subprocess.Popen('celery -A main.celery worker --loglevel=info -Q {queue_name} --logfile=logs\\{queue_name}.log --concurrency=1 -n {worker_name}')
This basically starts a new celery worker and a new queue.
My initial understanding was that we can spawn only n number of workers where n is the cpu_count(). So with this understanding, while testing my code I found that when my 9th worker was spawned, I assumed it will wait for any one of the previous 8 workers to finish execution before taking up the task, but as soon as it was spawned it started consuming from the queue while rest 8 workers were also executing and same happened when I spawned more workers(15 workers in total).
This brings me to my question that the --concurrency argument in a celery process is responsible for parallel execution within that worker? If I spawned 15 independent workers does that mean 15 different processes can be executed in parallel?
Any help is appreciated in understanding this concept.
Edit: I also noticed that each new task received in the corresponding worker spawns a new python.exe process(as per the task manager) and the previous python process spawned remains in memory unused. This does not happen when I spawn worker as "solo" rather than "prefork". Problem with using solo? celery.inspect().active() does not return anything if the workers are executing something and respond back when no tasks are in progress.
Solution 1:[1]
If your tasks are I/O bound, and it seems they are, then perhaps you should change the concurrency type to Eventlet. Then you can in theory have concurrency set even to 1000. However, it is a different execution model so you need to write your tasks carefully to avoid deadlocks.
If the tasks are CPU-bound, then I suggest you have concurrency set to N-1, where N is number of cores, unless you want to overutilise, in which case you can pick a slightly bigger number.
PS. you CAN spawn many worker-processes, but since they all run concurrently (separate processes in this case) their CPU utilisation would be low so it really makes no sense to go above the number of available cores.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
