'Airflow upload Pandas dataframe to Redshift table
I would like to use Airflow to populate a table in redshift. The data that I want to insert is in the form of a pandas dataframe, although I could write it to a csv or any other format.
I am looking at the documentation for the RedshiftSQLOperator but the inserts are hardcoded, I don't know if there is a way to bulk upload data.
Solution 1:[1]
The redshift operator also defines a hook which gives you access to a SQL engine -- https://airflow.apache.org/docs/apache-airflow-providers-amazon/2.4.0/_api/airflow/providers/amazon/aws/hooks/redshift/index.html#airflow.providers.amazon.aws.hooks.redshift.RedshiftSQLHook.
You'll be well advised to use the pandas to_sql function wherein you specify the connection parameter to effect the insert.
redshift_hook = RedshiftSQLHook(...)
engine = redshift_hook.get_sqlalchemy_engine()
df.to_sql(..., conn=engine)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Pbd |
