'The conn_id isn't defined
I'm learning Airflow and I'm trying to understand how connections works.
I have a first dag with the following code:
c = Connection(
conn_id='aws_credentials',
conn_type='Amazon Web Services',
login='xxxxxxxx',
password='xxxxxxxxx'
)
def list_keys():
hook = S3Hook(aws_conn_id=c.conn_id)
logging.info(f"Listing Keys from {bucket}/{prefix}")
keys = hook.list_keys(bucket, prefix=prefix)
for key in keys:
logging.info(f"- s3://{bucket}/{key}")
In this case It's working fine. The connection is well passed to the S3Hook.
Then I have a second dag:
redshift_connection = Connection(
conn_id='redshift',
conn_type='postgres',
login='duser',
password='xxxxxxxxxx',
host='xxxxxxxx.us-west-2.redshift.amazonaws.com',
port=5439,
schema='db'
)
aws_connection = Connection(
conn_id='aws_credentials',
conn_type='Amazon Web Services',
login='xxxxxxxxx',
password='xxxxxxxx'
)
def load_data_to_redshift(*args, **kwargs):
aws_hook = AwsHook(aws_connection.conn_id)
credentials = aws_hook.get_credentials()
redshift_hook = PostgresHook(redshift_connection.conn_id)
sql_stmnt = sql_statements.COPY_STATIONS_SQL.format(aws_connection.login, aws_connection.password)
redshift_hook.run(sql_stmnt)
dag = DAG(
's3_to_Redshift',
start_date=datetime.datetime.now()
)
create_table = PostgresOperator(
task_id='create_table',
postgres_conn_id=redshift_connection.conn_id,
sql=sql_statements.CREATE_STATIONS_TABLE_SQL,
dag=dag
)
This dag return me the following error: The conn_idredshiftisn't defined
Why is that? What are the differences between my first and second dag? Why the connection does seems to work in the first example and not in the second situation?
Thanks.
Solution 1:[1]
Connections are usually created using the UI or CLI as described here and stored by Airflow in the database backend. The operators and the respective hooks then take a connection ID as an argument and use it to retrieve the usernames, passwords, etc. for those connections.
In your case, I suspect you created a connection with the ID aws_credentials using the UI or CLI. So, when you pass its ID to S3Hook it successfully retrieves the credentials (from the Airflow's database backend, not from the Connection object that you created).
But, you did not create a connection with the ID redshift, therefore, AwsHook complains that it is not defined. You have to create the connection as described in the documentation first.
Note: The reason for not defining connections in the DAG code is that the DAG code is usually stored in a version control system (e.g., Git). And it would be a security risk to store credentials there.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
