'Randomize selection from snowflake SQL

I am trying to select 1,000 random rows from a database of 97 million rows. I am using the following code:

SELECT *
FROM "DB"."SCHEMA"."TABLE"
ORDER BY                                         
   RAND()                                      LIMIT 1000

I tried this code and got an error stating "SQL compilation error: Unknown function RAND." Is there a better way to do this in Snowflake? I am worried that this code only works in MySQL.



Solution 1:[1]

Using SAMPLE clause:

Returns a subset of rows sampled randomly from the specified table. The following sampling methods are supported:

  • Sample a fraction of a table, with a specified probability for including a given row. The number of rows returned depends on the size of the table and the requested probability. A seed can be specified to make the sampling deterministic.

  • Sample a fixed, specified number of rows. The exact number of specified rows is returned unless the table contains fewer rows.

SELECT *
FROM "DB"."SCHEMA"."TABLE"
SAMPLE (1000 ROWS);

Solution 2:[2]

In Snowflake the function is RANDOM(), not RAND().

So your original query should be:

SELECT *
FROM "DB"."SCHEMA"."TABLE"
ORDER BY
   RANDOM()
LIMIT 1000

But as Lukasz mentioned, SAMPLE() function is the native way to do it in Snowflake.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Lukasz Szozda
Solution 2 Eric Lin