'Snowflake ingestion: Snowpipe/Stream/Tasks or External Tables/Stream/Tasks
For ingesting data from an external storage location into Snowflake when de-duping is necessary, I came across two ways:
Option 1:
- Create a Snowpipe for the storage location (Azure container or S3 bucket) which is automatically triggered by event notifications (Azure event grid and queues or AWS SQS) and copy data into a staging table in Snowflake
- Create a Stream for this staging table to capture change data
- Periodically run a task that consumes the Stream data and merges (upserts) data into the destination table based on the primary key
Option 2:
- Create an external table with automatic refresh through event notifications (Azure event grid and queues or AWS SQS)
- Create a Stream for this external table to capture change data
- Periodically run a task that consumes the Stream data and merges (upserts) data into the destination table based on the primary key
I believe if the merge statement wasn't necessary to enforce primary key and remove duplicates, Snowpipe was the clear winner because it copies changed data directly into a table in one step. However, since staging and merging the data is necessary, which option is better?
Thank you!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
