'How can I get snowsql to use multiple clusters for parallel processing? [closed]
I'm having trouble getting snowsql to run multiple queries at once using an autoscaling warehouse that should scale up to 4 clusters.
My goal is to run a group of queries via snowsql and take advantage of autoscaling when one query is not done but another query could begin on another available cluster.
I understand I could use Python to run threads executing queries in parallel, but this seems too complex when Snowflake could have an easier way to manage parallel processing.
This was originally discussed and answered in the dbt slack.
Solution 1:[1]
There's an easy way to run queries asynchronously with the snowsql CLI: Just add an > after the ;.
For example, this short script should take 18 seconds to run:
select current_timestamp(), count(seq4()) from table(generator(timelimit => 3)) v ;>
select current_timestamp(), count(seq4()) from table(generator(timelimit => 3)) v ;>
select current_timestamp(), count(seq4()) from table(generator(timelimit => 3)) v ;>
select current_timestamp(), count(seq4()) from table(generator(timelimit => 3)) v ;>
select current_timestamp(), count(seq4()) from table(generator(timelimit => 3)) v ;>
select current_timestamp(), count(seq4()) from table(generator(timelimit => 3)) v ;>
But as each line ends with a >, snowsql executes them in parallel (all in only 3 seconds).
To examine the results of each query interactively, just ask snowsql for !queries.
In bash/shell:
snowsql -q "select current_timestamp(), count(seq4()) from table(generator(timelimit => 3));>
select current_timestamp(), count(seq4()) from table(generator(timelimit => 3));"
Read more:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
