'MWAA in productions - tasks queued for unknown reasons
Does anyone use MWAA in production?
We currently have around 500 DAGs running and we see an unexpected behavior with tasks staying in a "queued" state for unknown reasons.
Task is in the 'queued' state which is not a valid state for execution. The task must be cleared in order to be run.
It happens randomly, can perfectly run for a day and then a few tasks will stay queued. The tasks will stay in this state forever unless we mark them as failed manually.
A DAG run can stay in this "queued" state even if the pool is empty, I don't see any reasons explaining this.
It happens to ~5% of the tasks with all the others running smoothly.
Did you ever encounter this behavior?
Solution 1:[1]
This was happening to me in MWAA as well. It only started happening after I upgraded to Airflow version 2.2.2
. The solution that worked for me was adding to Airflow configuration options via the web UI the following option:
Configuration option:
celery.pool
Custom value:solo
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Kevin Vo |