'Group by query in snowflake
I have a Snowflake table like the following one:

And I wanted to get for each distinct combination "COMPANY"-"BUSINESS UNIT"-"APPROVER LEVEL", the entry with the most recent data (In case we have more than one entry with most recent date it should return all the entries). Thus, considering the table of the example above, it would return the following:

What is the SQL query I have to write in Snowflake order to obtain this?
Solution 1:[1]
So very similar to Brandon's answer use QUALIFY:
But given you want first ranking items, use RANK (which is the same is DENSE_RANK that Phil mentions)
A CTE for the data (I have used NUMBERS for the valid date, but TEXT would also sort just fine, or converting to DATE also would work the same):
WITH sample_data as (
SELECT * FROM VALUES
('c1', 'but1', 'l1', 20220406, 'Mr. 0', 'id0' ),
('c1', 'but1', 'l1', 20220406, 'Mr. 1', 'id2' ),
('c1', 'but1', 'l1', 20220212, 'Mr. 2', 'id2' ),
('c1', 'but1', 'l1', 20220130, 'Mr. 3', 'id3' ),
('c1', 'but1', 'l2', 20220320, 'Mr. 4', 'id4' ),
('c1', 'but1', 'l2', 20220115, 'Mr. 5', 'id5' ),
('c1', 'but1', 'l2', 20220102, 'Mr. 6', 'id6' )
t(company, business_unit, approver_level, valid_from, approver_name, approver_id)
)
The following SQL
SELECT *
FROM sample_data
QUALIFY rank() over(partition by company, business_unit, approvel_level order by valid_from desc ) = 1;
gives:
| COMPANY | BUSINESS_UNIT | APPROVER_LEVEL | VALID_FROM | APPROVER_NAME | APPROVER_ID |
|---|---|---|---|---|---|
| c1 | but1 | l1 | 20220406 | Mr. 0 | id0 |
| c1 | but1 | l1 | 20220406 | Mr. 1 | id2 |
| c1 | but1 | l2 | 20220320 | Mr. 4 | id4 |
And if your database is not Snowflake (as you have tagged the issue) and does not have QUALIFY here is the way to do this pattern:
WITH smaple_data as (
SELECT * FROM VALUES
('c1', 'but1', 'l1', 20220406, 'Mr. 0', 'id0' ),
('c1', 'but1', 'l1', 20220406, 'Mr. 1', 'id2' ),
('c1', 'but1', 'l1', 20220212, 'Mr. 2', 'id2' ),
('c1', 'but1', 'l1', 20220130, 'Mr. 3', 'id3' ),
('c1', 'but1', 'l2', 20220320, 'Mr. 4', 'id4' ),
('c1', 'but1', 'l2', 20220115, 'Mr. 5', 'id5' ),
('c1', 'but1', 'l2', 20220102, 'Mr. 6', 'id6' )
t(company, business_unit, approver_level,
valid_from, approver_name, approver_id)
)
SELECT company, business_unit,
approver_level, valid_from,
approver_name, approver_id
FROM (
SELECT company, business_unit,
approver_level, valid_from,
approver_name, approver_id,
DENSE_RANK() OVER (PARTITION BY COMPANY, BUSINESS_UNIT, APPROVER_LEVEL
ORDER BY VALID_FROM DESC) as dr
FROM smaple_data
)
WHERE dr = 1
ORDER BY 1,2,3;
Solution 2:[2]
You could utilize Qualify to handle this. To do so, something like this should work
SELECT *
FROM table_a
QUALIFY ROW_NUMBER() OVER (PARTITION BY COMPANY, BUSINESS_UNIT, APPROVER_LEVEL
ORDER BY VALID_FROM DESC) = 1
;
Solution 3:[3]
The main idea is same as suggested by @Brandon Coleman, just need to tweek a bit.
with cte as
(
select max(rn) max_rn from (
select rank() over (partition by company,bu,appr_lvl
order by valid_from) rn
from your_table_name)
)
select *,rank() over (partition by company,bu,appr_lvl
order by valid_from) rn
from your_table_name,cte
qualify rn>=cte.max_rn;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Brandon Coleman |
| Solution 3 | Pankaj |
