'Group by query in snowflake

I have a Snowflake table like the following one:

And I wanted to get for each distinct combination "COMPANY"-"BUSINESS UNIT"-"APPROVER LEVEL", the entry with the most recent data (In case we have more than one entry with most recent date it should return all the entries). Thus, considering the table of the example above, it would return the following:

What is the SQL query I have to write in Snowflake order to obtain this?

Solution 1:^[1]

So very similar to Brandon's answer use QUALIFY:

But given you want first ranking items, use RANK (which is the same is DENSE_RANK that Phil mentions)

A CTE for the data (I have used NUMBERS for the valid date, but TEXT would also sort just fine, or converting to DATE also would work the same):

WITH sample_data as (
    SELECT * FROM VALUES
        ('c1', 'but1', 'l1', 20220406, 'Mr. 0', 'id0' ),
        ('c1', 'but1', 'l1', 20220406, 'Mr. 1', 'id2' ),
        ('c1', 'but1', 'l1', 20220212, 'Mr. 2', 'id2' ),
        ('c1', 'but1', 'l1', 20220130, 'Mr. 3', 'id3' ),
        ('c1', 'but1', 'l2', 20220320, 'Mr. 4', 'id4' ),
        ('c1', 'but1', 'l2', 20220115, 'Mr. 5', 'id5' ),
        ('c1', 'but1', 'l2', 20220102, 'Mr. 6', 'id6' )
    t(company, business_unit, approver_level, valid_from, approver_name, approver_id)
)

The following SQL

SELECT *
FROM sample_data
QUALIFY rank() over(partition by company, business_unit, approvel_level order by valid_from desc ) = 1;

gives:

COMPANY	BUSINESS_UNIT	APPROVER_LEVEL	VALID_FROM	APPROVER_NAME	APPROVER_ID
c1	but1	l1	20220406	Mr. 0	id0
c1	but1	l1	20220406	Mr. 1	id2
c1	but1	l2	20220320	Mr. 4	id4

And if your database is not Snowflake (as you have tagged the issue) and does not have QUALIFY here is the way to do this pattern:

WITH smaple_data as (
    SELECT * FROM VALUES
        ('c1', 'but1', 'l1', 20220406, 'Mr. 0', 'id0' ),
        ('c1', 'but1', 'l1', 20220406, 'Mr. 1', 'id2' ),
        ('c1', 'but1', 'l1', 20220212, 'Mr. 2', 'id2' ),
        ('c1', 'but1', 'l1', 20220130, 'Mr. 3', 'id3' ),
        ('c1', 'but1', 'l2', 20220320, 'Mr. 4', 'id4' ),
        ('c1', 'but1', 'l2', 20220115, 'Mr. 5', 'id5' ),
        ('c1', 'but1', 'l2', 20220102, 'Mr. 6', 'id6' )
    t(company, business_unit, approver_level, 
      valid_from, approver_name, approver_id)
)
SELECT company, business_unit, 
    approver_level, valid_from, 
    approver_name, approver_id
FROM (
    SELECT company, business_unit, 
        approver_level, valid_from, 
        approver_name, approver_id,
        DENSE_RANK() OVER (PARTITION BY COMPANY, BUSINESS_UNIT, APPROVER_LEVEL
             ORDER BY  VALID_FROM DESC) as dr
    FROM smaple_data
) 
WHERE dr = 1
ORDER BY 1,2,3;

Solution 2:^[2]

You could utilize Qualify to handle this. To do so, something like this should work

SELECT * 
  FROM table_a
QUALIFY ROW_NUMBER() OVER (PARTITION BY COMPANY, BUSINESS_UNIT, APPROVER_LEVEL
                  ORDER BY  VALID_FROM DESC) = 1
;

Solution 3:^[3]

The main idea is same as suggested by @Brandon Coleman, just need to tweek a bit.

with cte as
(
select max(rn) max_rn from (
select rank() over (partition by company,bu,appr_lvl 
order by valid_from) rn 
from your_table_name)
)
select *,rank() over (partition by company,bu,appr_lvl 
order by valid_from) rn 
from your_table_name,cte 
qualify rn>=cte.max_rn;

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	Brandon Coleman
Solution 3	Pankaj

'Group by query in snowflake

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]