Category "airflow"

Problem with start date and scheduled date in Apache Airflow

I am working with Apache Airflow and I have a problem with the scheduled day and the starting day. I want a DAG to run every day at 8:00 AM UTC. So, I did: defa

Convert airflow macro 'ts' into datetime object

I encountered a problem with converting the airflow macros reference 'ts' into a datetime object. The problem is with the tz at the end of the string. from dat

Table expiration in GCS to BQ Airflow task

I am copying a CSV into a new BQ table using the GCSToBigQueryOperator task in Airflow. Is there a way to add a table expiration to this table within this task?

Is there a way to pause an airflow DagRun?

Is there a way to pause a specific DagRun within Airflow? I want to be able to have multiple, simultaneous executing runs of a single DAG, and I want to be abl

Kubernetes Operator in Airflow is not sharing the load across nodes. Why?

I have airflow 1.10.5 on a Kubernetes cluster. The DAGs are written with Kubernetes operator so that they can spin pods for each task inside the DAG on executio

Airflow DAGS Orchestration

I have three DAGs (say, DAG1, DAG2 and DAG3). I have a monthly scheduler for DAG1. DAG2 and DAG3 must not be run directly (no scheduler for these) and must be r

How to view code in a Github repository as of a specific release?

I would like to see what the code in a whole repository looks like of a specific release. As an example, I'd like to view the code for Apache Airflow as of vers

airflow 'NoneType' object has no attribute 'is_paused',how to fix it?

I am new to airflow and I just follow the tutorial to run a dag. Actually I did it successfully, but the problem is when I try to pause the dag by inputing comm

Airflow dags and PYTHONPATH

I have some dags that can't seem to locate python modules. Inside of the Airflow UI, I see a ton of these message variations. Broken DAG: [/home/airflow/source

Creating dynamic workflows for Airflow tasks present in a Python list

I have a list of lists in the following way - [['X_API', 'Y_API',....], ['Z_API', 'P_API', ...], [....], [...] .... ] Here, each API name corresponds to a Pytho

How do I call scrapy from airflow dag?

My scrapy project runs perfectly well with 'scrapy crawl spider_1' command. How to trigger it (or call the scrappy command) from airflow dag? with DAG(<args&

Airflow Standalone Cannot use relative path:

I just installed Airflow 2.3.0 using the command pip install "apache-airflow==2.3.0" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-

Efficient way to deploy dag files on airflow

Are there any best practices that are followed for deploying new dags to airflow? I saw a couple of comments on the google forum stating that the dags are sav

Is there a way I can retrieve the external base url of the Airflow Server via python?

I will want to provide a direct access to the log file. However, client will be accessing via external IP. So i will like to see if there's any way i can print

Airflow DockerOperator unable to mount tmp directory correctly

I am trying to run a simple python script within a docker run command scheduled with Airflow. I have followed the instructions here Airflow init. My .env file:

How to import airflow variables in MWAA airflow

I'm not able to import new airflow variables from a json file to my MWAA env through Boto3 & aws_mwaa. The response code from aws_mwaa/cli is 400. However,

How do I create and pass a test conn_id to the S3_hook.S3Hook(conn_id) to debug a code in airflow DAG

I have a deployed DAG in which I'm using check_for_wildcard_key() to check if files for a particular day are present in an s3 location and then decide which bra

airflow sla_miss_callback function doesn't trigger and doesn't show up in dag details page

I would like to send alerts via my custom callback function that I'm calling during dag initialization as show below- dag = DAG('tutorial', default_args=defaul

Airflow/Luigi for AWS EMR automatic cluster creation and pyspark deployment

I am new to airflow automation, i dont now if it is possible to do this with apache airflow(or luigi etc) or should i just make a long bash file to do this. I

Airflow 2.0.1: Pod Template Override not working as expected for KubernetesExecutor

Setup: Airflow 2.0.1 with Kubernetes 1.18 and Python 3.8, Kubernetes Client: 18.17.x Pod template file: apiVersion: v1 kind: Pod metadata: name: workerPod sp