'Use result from one operator inside another
I would like in get_birth_date use result from get_all_pets. How can i access it inside get_birth_date? Moreover i would like to print result from get_all_pets, where could i deifne such print()?
Where in my code could i do this?
import datetime
from airflow import DAG
from airflow.providers.postgres.operators.postgres import PostgresOperator
# create_pet_table, populate_pet_table, get_all_pets, and get_birth_date are examples of tasks created by
# instantiating the Postgres Operator
with DAG(
dag_id="postgres_operator_dag",
start_date=datetime.datetime(2020, 2, 2),
schedule_interval="@once",
catchup=False,
) as dag:
create_pet_table = PostgresOperator(
task_id="create_pet_table",
postgres_conn_id="postgres_robert",
sql="""
CREATE TABLE IF NOT EXISTS pet (
pet_id SERIAL PRIMARY KEY,
name VARCHAR NOT NULL,
pet_type VARCHAR NOT NULL,
birth_date DATE NOT NULL,
OWNER VARCHAR NOT NULL);
""",
)
populate_pet_table = PostgresOperator(
task_id="populate_pet_table",
postgres_conn_id="postgres_robert",
sql="""
INSERT INTO pet (name, pet_type, birth_date, OWNER)
VALUES ( 'Max', 'Dog', '2018-07-05', 'Jane');
INSERT INTO pet (name, pet_type, birth_date, OWNER)
VALUES ( 'Susie', 'Cat', '2019-05-01', 'Phil');
INSERT INTO pet (name, pet_type, birth_date, OWNER)
VALUES ( 'Lester', 'Hamster', '2020-06-23', 'Lily');
INSERT INTO pet (name, pet_type, birth_date, OWNER)
VALUES ( 'Quincy', 'Parrot', '2013-08-11', 'Anne');
""",
)
get_all_pets = PostgresOperator(task_id="get_all_pets",postgres_conn_id="postgres_robert", sql="SELECT * FROM pet;")
get_birth_date = PostgresOperator(
task_id="get_birth_date",
postgres_conn_id="postgres_robert",
sql="SELECT * FROM pet WHERE birth_date BETWEEN SYMMETRIC %(begin_date)s AND %(end_date)s",
parameters={"begin_date": "2020-01-01", "end_date": "2020-12-31"},
runtime_parameters={'statement_timeout': '3000ms'},
)
create_pet_table >> populate_pet_table >> get_all_pets >> get_birth_date
Solution 1:[1]
PostgresOperator is not suitable for running SELECT statements.
SELECT statements are more suitable for transfer operators or using hooks directly.
In your case you should use the PostgresHook:
from airflow.decorators import task
from airflow.providers.postgres.hooks.postgres import PostgresHook
@task()
def get_all_pets(**kwargs):
hook = PostgresHook(postgres_conn_id="postgres_robert")
df = hook.get_pandas_df(sql="SELECT * FROM pet;")
print(df)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Elad Kalif |
