'Airflow + Docker: Path behaviour (+Repo)

I have difficulties to understand how the paths in airflow work. I created this repository so that it is easy to see what I mean: https://github.com/remo2479/airflow_example/blob/master/dags/testdag.py I created this repository from scratch according to the manual on the airflow page. I just deactivated the example DAGs.

As you can see in the only DAG (dags/testdag.py) the DAG contains two tasks and one variable declaration using an opened file. The two tasks are using the dummy sql script in the repository (dags/testdag/testscript.sql). One time i used testdag/testscript.sql as path (task 1) and one time dags/testdag/testscript.sql (task 2). With a connection set up task 1 would work and task 2 wouldnt because the template cannot be found. This is how I would expect both tasks to run since the dag is in the dags folder and we should not put it in the path.

But when I try to open the testscript.sql and read its contents it's necessary that I put "dags" in the path (dags/testdag/testscript.sql). Why does the path behave differently when using the MsSqlOperator and the open-function?

For convenience I put the whole script in this post:

from airflow import DAG
from airflow.providers.microsoft.mssql.operators.mssql import MsSqlOperator
from datetime import datetime

with DAG(
    dag_id = "testdag",
    schedule_interval="30 6 * * *",
    start_date=datetime(2022, 1, 1),
    catchup=False) as dag:

    # Error because of missing connection - this is how it should be
    first_task = MsSqlOperator(
        task_id="first_task",
        sql="testdag/testscript.sql")
    
    # Error because of template not found
    second_task = MsSqlOperator(
        task_id="second_task",
        sql="dags/testdag/testscript.sql")

    # When trying to open the file the path has to contain "dags" in the path - why?
    with open("dags/testdag/testscript.sql","r") as file:
        f = file.read()
        file.close()

    first_task
    second_task


Solution 1:[1]

Using the template_searchpath will work as @Elad has mentioned, but this is DAG-specific.

To find files in Airflow without using template_searchpath, remember that everything Airflow runs starts in the $AIRFLOW_HOME directory (i.e. airflow by default, or wherever you're executing the services from). So either start there with all your imports, or reference them in relation to the code file you're currently in (i.e. current_dir from my previous answer).

Setting Airflow up for the first time can be fiddly.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Thom Bedford