'Number of Snowflake Clustering Key
I have a question, it may sound naive, i have been playing around clustering keys in snowflake and trying to understand it, just curious to know can we specify more than one clustering key on snowflake table or is it same as primary key one per table?
Solution 1:[1]
Short answer is no (not more then one clustering key) and no (not same as primary key).
Clustering key help to manage micro partitions for a table which in turn help in query performance by means of partition pruning. We can have a clustering key and 'Add' more columns to it or change it by putting new columns to it. Its exact implementation can be referred in documentation.
Why clustering key is not primary key - As primary key means one of a kind and it ensures uniqueness which is not the case for clustering key. Think of clustering key more like a 'PARTITION KEY'.
Below table is clustered on ORDER_ID and there are duplicate entries in order_id.
select * from CHK_CLUSNG_K;
+----+-------+------------+----------+
| ID | NAME | CR_DT | ORDER_ID |
|----+-------+------------+----------|
| 1 | name1 | 2022-03-28 | 1 |
| 1 | name1 | 2022-03-28 | 1 |
| 1 | name1 | 2022-03-28 | 1 |
| 1 | name1 | 2022-03-28 | 1 |
| 1 | name1 | 2022-03-28 | 2 |
+----+-------+------------+----------+
show tables like 'chk_clus%';
+-------------------------------+--------------+---------------+-------------+-------+---------+------------------+------+-------+--------------+----------------+----------------------+-----------------+---------------------+------------------------------+---------------------------+-------------+
| created_on | name | database_name | schema_name | kind | comment | cluster_by | rows | bytes | owner | retention_time | automatic_clustering | change_tracking | search_optimization | search_optimization_progress | search_optimization_bytes | is_external |
|-------------------------------+--------------+---------------+-------------+-------+---------+------------------+------+-------+--------------+----------------+----------------------+-----------------+---------------------+------------------------------+---------------------------+-------------|
| 2022-03-28 04:41:10.486 -0700 | CHK_CLUSNG_K | TEST_DB | PUBLIC | TABLE | | LINEAR(order_id) | 5 | 1536 | ACCOUNTADMIN | 1 | ON | OFF | OFF | NULL | NULL | N |
+-------------------------------+--------------+---------------+-------------+-------+---------+------------------+------+-------+--------------+----------------+----------------------+-----------------+---------------------+------------------------------+---------------------------+-------------+
Solution 2:[2]
As has been mentioned by Nick and Pankaj previously, you can only have one Clustering Key on a table and the order of the columns/expressions in the key are significant in terms of filtering benefits for reads.
However, if your account is using Enterprise Edition, you can also create Materialized View(s) on the table. Materialized View’s can be clustered differently to the base tables order, in order to support queries that require a different sort order. The Materialized View’s are maintained by a background service and guaranteed to give the same query result, as querying the base table. Snowflake will use the Materialized View if the clustering-key provides a better query plan for your query against the base table. There is a cost associated with the back ground service that maintains the Materialized View(s) so worth reading the documentation related to Materialized Views and their application, and this topic specifically on clustering (https://docs.snowflake.com/en/user-guide/views-materialized.html#best-practices-for-clustering-materialized-views-and-their-base-tables)
Search Optimisation, is another enterprise-edition feature for speeding up queries with certain access properties. You can read up more on Search Optimisation here ( https://docs.snowflake.com/en/user-guide/search-optimization-service.html#using-the-search-optimization-service )
Solution 3:[3]
You can only have one clustering key per table, though that key can consist of multiple columns/expressions
Solution 4:[4]
Can you have more than one CLUSTERING on a table: No Can you have more than one column as part of the CLUSTERING KEYS on a table: YES,
CREATE TABLE <name> ... CLUSTER BY ( <expr1> [ , <expr2> ... ] )
so you have have many columns/expressions just like you can in an ORDER BY clause. But just like an ORDER BY clause they are done in presented order.
Primary keys, are a meaningless concept, inside of Snowflake.
The thing to note with Clustering Keys, is they are soft hint, verse a hard partitioning. Which is to say the loser you make them the less churn on your tables is required, but the less sharp will be your pruning.
Fieldy's point about Materialized view is very good, and you don't have to name the Materialized view in your SELECT operation for it to be used, if it's sort is aligned with the query you currently running. And given disk is practically free (compared to compute) it's a rather nice solution.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Pankaj |
| Solution 2 | Fieldy |
| Solution 3 | NickW |
| Solution 4 | Simeon Pilgrim |
