'Building CDC in Snowflake

My company is migrating to snowflake from SQL Server 2017 and am looking to build historical data tables that capture delta changes. In SQL, these would be in stored procedures, where old records would get expired (change to data) and insert the new row with updated data. This design allows dynamic retrieval of historical data at any point in time.

My question is, how would i migrate this design to snowflake? From what I read about procedures, they're more like UDTs or scalar functions (SQL equiv) , but in javascript lang...

Below is brief example of how we are doing CDC for tables in SQL

Would data pipeline cover this? If anyone knows good tutorial site for snowflake 101 (not snowflake offical documentation, its terrible). would be appreciated

thanks

update h 
set h.expiration_date = t.effective_date
from  data_table_A_history h

join data_table_A  as t
on h.account_id = t.account_id

where h.expiration_date is null
and (
             (isnull(t.person_name,'x') <> isnull(h.person_name,'x')) or
        (isnull(t.person_age,0) <> isnull(h.person_age,0)) 


    )
---------------------------------------------------------------------
insert into data_table_A_history (account_id,person_name,person_age)
select 
account_id,person_name,person_age
from 
    data_table_A t
    
    left join data_table_A_history h
    on t.account_id = h.account_id
    and h.expiration_date is null
where 
    h.account_id is null

snowflake-cloud-data-platform

Solution 1:^[1]

Table streams are Snowflake's CDC solution

You can setup multiple streams on a single table and it will track changes to the table from a particular point in time. This point in time is changed once you consume the data in the stream, with the new starting point being from the time you consumed the data. Consumption here is when you either use the data to upsert another table or perhaps insert the data into a log table for example. Simply select statements do not consume the data

A pipeline could be something like this: Snowpipe->staging table->stream on staging table->task with SP->merge/upsert target table

If you wanted to keep a log of the changes then you could setup a 2nd stream on the staging table and consume that by inserting the data into another table

Another trick, if you didn't want to use a 2nd stream is to amend your SP so that before you consume the data, run a select on the stream and then immediately run

INSERT INTO my_table select * from table(result_scan(last_query_id()))

This does not consume the stream and change the offset and leaves the stream data available to be consumed by another DML operation

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Dean Flinter

'Building CDC in Snowflake

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]