'How do I create a chain for data with parent child relationship using python?

If I have this set of input to convert, Input:

Task A -> Task B
Task A -> Task C
Task B -> Task D
Task C -> Task E

Making use of pandas python: df = pd.DataFrame({"parent": ['Task A', 'Task A', 'Task B, 'Task C'], "child":["Task B", "Task C", 'Task D', 'Task E']}) as my input.

Output:

Task A >> (Task B, Task C) >> (Task D, Task E)

Function will return above result.

I will hope to achieve this output as I am using the output to provide airflow to configure the relationship of my tasks.



Solution 1:[1]

I don't understand your Pandas example, but in Airflow you can create 1-to-1 and 1-to-many dependencies between tasks in Airflow, but you cannot create many-to-many dependencies in Airflow using the bitshift operators (>> and <<).

Those can be set using a for loop:

tasks_a = [t1, t2, t3]
tasks_b = [t4, t5, t6]

for task in tasks_a:
    task >> tasks_b

Or using Airflow's cross_downstream() function:

from airflow.models.baseoperator import cross_downstream

tasks_a = [t1, t2, t3]
tasks_b = [t4, t5, t6]
cross_downstream(from_tasks=tasks_a, to_tasks=tasks_b)

Which will create dependencies from all tasks in tasks_a to all tasks in tasks_b:

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Bas Harenslak