'Apache Airflow - split task in multiple parallel tasks where each task takes portion of the list as input argument
Let's say I have list with 100 items called mylist.
I have function that performs certain operation with each element of the list.
I order to speed things up I want define n parallel tasks.
Each task should take 100/n list items and process them.
I understand all about executors and core settings which I need to change to enable parallelism, I need some basic pointer on how to set tasks right.
My idea is to build it like this:
imports...
mylist = [item0, item2,...,item99]
n=5
def myfunction(sub_list):
"""This is a function that will run within the DAG execution"""
"""Procesing list elements"""
# Generate 5 tasks
for i in range(1,len(lst), n):
task = PythonOperator(
task_id='myfunction_' + str(i),
python_callable=myfunction,
op_kwargs={'sub_list': mylist[i:i + n]},
dag=dag,
)
task
I've assebmled this algorithm according to documentation. Is this proper way of doing this?
Solution 1:[1]
My initial proposition is correct, so you can run tasks in parallel while all tasks are using certain portion of the list as input like this:
imports...
mylist = [item0, item2,...,item99]
n=5 # Number of parallel tasks, this is also number of splits that we are doing on list.
def myfunction(sub_list):
"""This is a function that will run within the DAG execution"""
"""Procesing list elements"""
# Generate some dummy initial task
d1 = DummyOperator(task_id='kick_off_dag')
# Generate 5 tasks
for i in range(1,len(mylist), n):
five_parallelTasks = PythonOperator(
task_id='myfunction_' + str(i),
python_callable=myfunction,
op_kwargs={'sub_list': mylist[i:i + n]},
dag=dag,
)
d1 >> five_parallelTasks
Where range(1,len(lst), n) is iterating over list by every n-th element. This allows us to define input in each task as mylist[i:i + n]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Calder White |
