'Problem with start date and scheduled date in Apache Airflow

I am working with Apache Airflow and I have a problem with the scheduled day and the starting day.

I want a DAG to run every day at 8:00 AM UTC. So, I did:

default_args = {
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2020, 12, 7, 10, 0,0),
        'email': ['[email protected]'],
        'email_on_failure': True,
        'email_on_retry': False,
        'retries': 1,
        'retry_delay': timedelta(hours=5)
    }
# Never run
dag = DAG(dag_id='id', default_args=default_args, schedule_interval='0 8 * * *',catchup=True)

The day I upload the DAG was 2020-12-07 and I wanted to run it on 2020-12-08 at 08:00:00.

I set the start_date at 2020-12-07 at 10:00:00 to avoid running it at 2020-12-07 at 08:00:00 and only trigger it the next day, but it didn't work.

Then I modified the starting day:

default_args = {
        'owner': 'airflow',
        'depends_on_past': False,
        'start_date': datetime(2020, 12, 7, 7, 59,0),
        'email': ['[email protected]'],
        'email_on_failure': True,
        'email_on_retry': False,
        'retries': 1,
        'retry_delay': timedelta(hours=5)
    }
# Never run
dag = DAG(dag_id='etl-ca-cpke-spark_dev_databricks', default_args=default_args, schedule_interval='0 8 * * *',catchup=True)

Now the start date is 1 minute before the DAG should run, and indeed, because the catchup is set to True, the DAG has been triggered for 2020-12-07 at 08:00:00, but it has not being triggered for 2020-12-08 at 08:00:00.

Why?



Solution 1:[1]

Airflow schedules tasks at the end of the interval (See documentation reference)

Meaning that when you do:

start_date: datetime(2020, 12, 7, 8, 0,0)
schedule_interval: '0 8 * * *'

The first run will kick in at 2020-12-08 at 08:00+- (depends on resources)

This run's execution_date will be: 2020-12-07 08:00

The next run will kick in at 2020-12-09 at 08:00

This run's execution_date of 2020-12-08 08:00.

Since today is 2020-12-08 the next run didn't kick in because it's not the end of the interval yet.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Peter Mortensen