'Airflow upload Pandas dataframe to Redshift table

I would like to use Airflow to populate a table in redshift. The data that I want to insert is in the form of a pandas dataframe, although I could write it to a csv or any other format.

I am looking at the documentation for the RedshiftSQLOperator but the inserts are hardcoded, I don't know if there is a way to bulk upload data.



Solution 1:[1]

The redshift operator also defines a hook which gives you access to a SQL engine -- https://airflow.apache.org/docs/apache-airflow-providers-amazon/2.4.0/_api/airflow/providers/amazon/aws/hooks/redshift/index.html#airflow.providers.amazon.aws.hooks.redshift.RedshiftSQLHook.

You'll be well advised to use the pandas to_sql function wherein you specify the connection parameter to effect the insert.

redshift_hook = RedshiftSQLHook(...)
engine = redshift_hook.get_sqlalchemy_engine()

df.to_sql(..., conn=engine)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Pbd