'Airflow, avoid top level pull from SSM

I have the following DAG, which works just fine:

from airflow import DAG
from airflow.models import Variable
from airflow.operators.subdag import SubDagOperator
from subdags import my_subdag

data_sets = Variable.get("data_sets", deserialize_json=True).get("data")

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2020, 1, 1),
}

with DAG(
        'myDAG',
        default_args=default_args,
        schedule_interval='00 12 * * *'
) as dag:

    ...

    for data_set in data_sets:
        subdag = SubDagOperator(
            task_id=f'{data_set}_subdag',
            subdag=my_subdag(
                parent_dag_name='myDAG',
                child_dag_name=f'{data_set}_subdag',
            ),
            ...
            default_args=default_args,
        )
        start >> subdag >> end

But as you can see, I am calling Variable at the top level, which is not best practice (the scheduler queries the secret backend every minute or so).

What can I do to make it so airflow is calling Variable.get only during execution? I was looking at best practices, I can't use another file ('Generating Python code with embedded meta-data') so I though maybe jinja templating could help but I am not sure how to proceed.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source