'How to integrate an Ingestion Flow Pipeline with AWS Glue?
I want to create an Entire Data pipeline on AWS. These are the components that I have planned on using:
- Data Source: This is a remote CSV file (let's say a file in GitHub).
- Data Lake: This will be an S3 bucket
- Ingestion flow: (Problem I have )To transfer data from the data source to the data lake.
- ETL flow: AWS Glue
- Data Warehouse: AWS Redshift(as Parquets)
My main problem is the ingestion flow. Is there a way to automatically download the remote CSV file into the s3 bucket as part of the Glue pipeline? Or should I use another service for this?
Finally, my aim is to create a single pipeline that will do the ingestion flow and ETL flow and save the data in the data warehouse.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
