'Snowflake Procedures Recursive Calls Run in Parrallel
I need to find a way to trigger Snowflake Procedure calls in parallel. A good case example is that during a table load, three of the initial inserts all go into the same interim table before the main block of SQL starts running. Because these three steps are not dependent on one-another I am looking at running them in-parallel. The current code-base has them run in-sequence, as we migrate to Snowflake we are building a Javascript Procedure framework to run all the migrated queries, is it possible to trigger Snowflake procedures to recursively run in parallel?
Below is a function that simply runs two commands that are independent of one another and can be run in parallel.
create or replace procedure stackOverflowProcedure (inputArray array)
returns varchar
language javascript
strict
as
$$
var demoIterator = 1;
while (demoIterator <= INPUTARRAY.length) {
snowflake.createStatement({sqlText: INPUTARRAY[demoIterator - 1]}).execute();
demoIterator = demoIterator + 1;
};
$$;
call stackOverflowProcedure(array_construct($$insert into temp_table select * from source_table_1;$$,$$insert into temp_table select * from source_table_2$$));
Solution 1:[1]
Yes, I use this all the time to run stored procedures in parallel.
Step 1: If the load has a large number of files (you get 8 files loading in parallel per node in the warehouse... nodes = credits in the sizing form), create as many warehouses as you want to run your load stored procedure in parallel.
Step 2: Create one task per stored procedure or instance of the same one if it's safe to run in parallel. If you're running a load with a large number of files, set up each task to use a different warehouse.
Step 3: Schedule your events to run at the same time. They may not start at precisely the same time, but they should be close.
Solution 2:[2]
You have a few options:
Use an external orchestrator
Something with a built-in DAG functionality, like DBT.
Union your selects into one single insert
Rather than executing separate inserts into the same table/structure, you could append them together using UNION or UNION ALL.
Note: This is the only option that will allow you to easily know when all the inserts are completed, as the other options run asynchronously (you don't easily know when they're all done).
INSERT INTO temp_table
SELECT * FROM source_table_1
UNION ALL
SELECT * FROM source_table_2;
If your selects are a bit too complicated, and you'd like to clean up the code a bit, you could also use CTEs:
INSERT INTO temp_table
WITH query1 AS (
SELECT ... FROM ...
), query2 AS (
SELECT ... FROM ...
)
SELECT * FROM query1
UNION ALL
SELECT * FROM query2;
Call EXECUTE TASK
Rather than executing your SQL queries directly, you could create a task for each one of them, and call EXECUTE TASK instead. Tasks triggered this way run asynchronously (the command doesn't wait for the task to finish -- or even to start, for that matter), which means they will essentially run in parallel.
Note: you can create tasks with no schedule or triggers, so they can only be triggered manually via EXECUTE TASK.
call stackOverflowProcedure(
array_construct(
'EXECUTE TASK task_1;',
'EXECUTE TASK task_2;'
)
);
Use a TASK TREE
As @mike-walton mentioned in his comment, you could create a simple task tree, and tasks with the same parent task will get executed in parallel.
CREATE TASK parent_task
[ PARAMETERS ]
AS
$$
SELECT 1;
$$;
CREATE TASK child_task_1 AFTER parent_task ...;
CREATE TASK child_task_2 AFTER parent_task ...;
CREATE TASK child_task_3 AFTER parent_task ...;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Greg Pavlik |
| Solution 2 |
