'Group by query in snowflake

I have a Snowflake table like the following one: enter image description here

And I wanted to get for each distinct combination "COMPANY"-"BUSINESS UNIT"-"APPROVER LEVEL", the entry with the most recent data (In case we have more than one entry with most recent date it should return all the entries). Thus, considering the table of the example above, it would return the following: enter image description here

What is the SQL query I have to write in Snowflake order to obtain this?



Solution 1:[1]

So very similar to Brandon's answer use QUALIFY:

But given you want first ranking items, use RANK (which is the same is DENSE_RANK that Phil mentions)

A CTE for the data (I have used NUMBERS for the valid date, but TEXT would also sort just fine, or converting to DATE also would work the same):

WITH sample_data as (
    SELECT * FROM VALUES
        ('c1', 'but1', 'l1', 20220406, 'Mr. 0', 'id0' ),
        ('c1', 'but1', 'l1', 20220406, 'Mr. 1', 'id2' ),
        ('c1', 'but1', 'l1', 20220212, 'Mr. 2', 'id2' ),
        ('c1', 'but1', 'l1', 20220130, 'Mr. 3', 'id3' ),
        ('c1', 'but1', 'l2', 20220320, 'Mr. 4', 'id4' ),
        ('c1', 'but1', 'l2', 20220115, 'Mr. 5', 'id5' ),
        ('c1', 'but1', 'l2', 20220102, 'Mr. 6', 'id6' )
    t(company, business_unit, approver_level, valid_from, approver_name, approver_id)
)

The following SQL

SELECT *
FROM sample_data
QUALIFY rank() over(partition by company, business_unit, approvel_level order by valid_from desc ) = 1;

gives:

COMPANY BUSINESS_UNIT APPROVER_LEVEL VALID_FROM APPROVER_NAME APPROVER_ID
c1 but1 l1 20220406 Mr. 0 id0
c1 but1 l1 20220406 Mr. 1 id2
c1 but1 l2 20220320 Mr. 4 id4

And if your database is not Snowflake (as you have tagged the issue) and does not have QUALIFY here is the way to do this pattern:

WITH smaple_data as (
    SELECT * FROM VALUES
        ('c1', 'but1', 'l1', 20220406, 'Mr. 0', 'id0' ),
        ('c1', 'but1', 'l1', 20220406, 'Mr. 1', 'id2' ),
        ('c1', 'but1', 'l1', 20220212, 'Mr. 2', 'id2' ),
        ('c1', 'but1', 'l1', 20220130, 'Mr. 3', 'id3' ),
        ('c1', 'but1', 'l2', 20220320, 'Mr. 4', 'id4' ),
        ('c1', 'but1', 'l2', 20220115, 'Mr. 5', 'id5' ),
        ('c1', 'but1', 'l2', 20220102, 'Mr. 6', 'id6' )
    t(company, business_unit, approver_level, 
      valid_from, approver_name, approver_id)
)
SELECT company, business_unit, 
    approver_level, valid_from, 
    approver_name, approver_id
FROM (
    SELECT company, business_unit, 
        approver_level, valid_from, 
        approver_name, approver_id,
        DENSE_RANK() OVER (PARTITION BY COMPANY, BUSINESS_UNIT, APPROVER_LEVEL
             ORDER BY  VALID_FROM DESC) as dr
    FROM smaple_data
) 
WHERE dr = 1
ORDER BY 1,2,3;

Solution 2:[2]

You could utilize Qualify to handle this. To do so, something like this should work

SELECT * 
  FROM table_a
QUALIFY ROW_NUMBER() OVER (PARTITION BY COMPANY, BUSINESS_UNIT, APPROVER_LEVEL
                  ORDER BY  VALID_FROM DESC) = 1
;

Solution 3:[3]

The main idea is same as suggested by @Brandon Coleman, just need to tweek a bit.

with cte as
(
select max(rn) max_rn from (
select rank() over (partition by company,bu,appr_lvl 
order by valid_from) rn 
from your_table_name)
)
select *,rank() over (partition by company,bu,appr_lvl 
order by valid_from) rn 
from your_table_name,cte 
qualify rn>=cte.max_rn;

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Brandon Coleman
Solution 3 Pankaj