'Airflow, avoid top level pull from SSM
I have the following DAG, which works just fine:
from airflow import DAG
from airflow.models import Variable
from airflow.operators.subdag import SubDagOperator
from subdags import my_subdag
data_sets = Variable.get("data_sets", deserialize_json=True).get("data")
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2020, 1, 1),
}
with DAG(
'myDAG',
default_args=default_args,
schedule_interval='00 12 * * *'
) as dag:
...
for data_set in data_sets:
subdag = SubDagOperator(
task_id=f'{data_set}_subdag',
subdag=my_subdag(
parent_dag_name='myDAG',
child_dag_name=f'{data_set}_subdag',
),
...
default_args=default_args,
)
start >> subdag >> end
But as you can see, I am calling Variable at the top level, which is not best practice (the scheduler queries the secret backend every minute or so).
What can I do to make it so airflow is calling Variable.get only during execution? I was looking at best practices, I can't use another file ('Generating Python code with embedded meta-data') so I though maybe jinja templating could help but I am not sure how to proceed.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
