'Airflow tasks not running in correct order?
cleanup_pbp is downstream of all 4 of load_pbp_30629, load_pbp_30630, load_to_bq_30629, load_to_bq_30630. cleanup_pbp started at 2021-12-05T08:54:48.
however, load_pbp_30630, one of the 4 upstream tasks, did not end until 2021-12-05T09:02:23.
How is cleanup_pbp running before load_pbp_30630 ends? I've never seen this before. I know our task dependencies have a bit of criss-cross going on, but that shouldn't explain why the tasks run out of order?
Solution 1:[1]
We have exactly the same problem and after checking, finally we found out that the problem is caused by the loop function in Dag script.
Actually we use a for loop to create tasks as well as their relationships. As in each iteration it will create the upstream task and the downstream task (like the cleanup_pbp in your case) and always give the same id to downstream one and define its relations (ex. Load_pbp_xxx >> cleanup_pbp), in graph or tree view it considers this downstream id has serval dependences but when Dag runs it takes only the relation defined in the last iteration. If you check Task Instance you will see only one task in its upstream_list.
The solution is to move the definition of this final task out of the loop but keep the definition of dependencies inside the loop. This well resolves our problem of running order, the final task won’t start until all dependences are finished. (with tigger rule all_success of course)
I’m not sure if you are in the same situation as us but hope this will give you some idea.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Hong YU |


