'Using both max and count aggregates in Snowflake

I have Snowflake table data as below.

RUNID OBJECT_NAME LAST_EXECUTION_DATETIME EXECUTION_STATUS
1 HR_JOB 4/19/2022 22:58:48 SUCCESS
2 HR_JOB 4/19/2022 23:30:42 SUCCESS
3 HR_JOB 4/19/2022 23:32:23 SUCCESS
4 HR_JOB 4/19/2022 23:35:38 SUCCESS
4 HR_JOB 4/19/2022 23:35:38 FAILED
5 HR_JOB 4/19/2022 23:37:58 FAILED

My requirement is to get the max of last_execution_datetime for the Execution_status as success.

In the example attached, the MAX date(RunId 5) has a "FAILED" status and cannot be taken into account.

For the Runid 4, it has both Success and failed as status and this also cannot be taken.

in this Scenario, the MAX date taken as Last Execution is the Runid 3, since the status is a Success

I tried the following Snowflake queries and not able to achieve the results.

select max(LAST_EXECUTION_DATETIME)
from
  (
select LAST_EXECUTION_DATETIME, count(*) dt_cnt
from EXECUTION_CONTROL
WHERE OBJECT_NAME = 'HR_JOB' 
group by LAST_EXECUTION_DATETIME
 )
 where dt_cnt = 1;

with cte AS
(
  select *, row_number() over (partition by LAST_EXECUTION_DATETIME order by execution_status desc) as rn 
  from EXECUTION_CONTROL WHERE OBJECT_NAME = 'HR_JOB'
)
select count(*) as cnt, max(LAST_EXECUTION_DATETIME) as mxdt
from cte
group by LAST_EXECUTION_DATETIME


Solution 1:[1]

it seems like "all rows are part of the same batch" but I am going to assume LAST_EXECUTION_DATE is the "batch" and LAST_EXECUTION_DATETIME is the order.

select *
from (
    select * 
    from EXECUTION_CONTROL
    where object_name = 'HR_JOB'
    qualify count(distinct execution_status) 
        over (partition by runid, LAST_EXECUTION_DATE) = 1 
)
where execution_status = 'SUCCESS'
qualify row_number() 
    over (partition by LAST_EXECUTION_DATE order by LAST_EXECUTION_DATETIME desc) = 1;

will remove any runid's that have more than one different status. If two success or two fails rows are possible and invalid remove the DISTINCT

then we remove any row that was not a SUCCESS

then we rank the rows, of the same LAST_EXECUTION_DATE batch by the LAST_EXECUTION_DATETIME to take the last valid one.

thus with your updated data:


with EXECUTION_CONTROL(runid, object_name, last_execution_datetime, last_exection_date, execution_status) as (
select column1, column2, to_timestamp(column3, 'mm/dd/yyyy hh:mi:ss'), to_timestamp(column3, 'mm/dd/yyyy hh:mi:ss')::date, column4  from values
(1,'HR_JOB','4/19/2022 22:58:48','SUCCESS'),
(2,'HR_JOB','4/19/2022 23:30:42','SUCCESS'),
(3,'HR_JOB','4/19/2022 23:32:23','SUCCESS'),
(4,'HR_JOB','4/19/2022 23:35:38','SUCCESS'),
(4,'HR_JOB','4/19/2022 23:35:38','FAILED'),
(5,'HR_JOB','4/19/2022 23:37:58','FAILED')
)
select *
from (
    select * 
    from EXECUTION_CONTROL
    where object_name = 'HR_JOB'
    qualify count(distinct execution_status) 
        over (partition by runid, last_exection_date) = 1 
)
where execution_status = 'SUCCESS'
qualify row_number() 
    over (partition by last_exection_date order by last_execution_datetime desc) = 1;

gives:

RUNID OBJECT_NAME LAST_EXECUTION_DATETIME LAST_EXECTION_DATE EXECUTION_STATUS
3 HR_JOB 2022-04-19 23:32:23.000 2022-04-19 SUCCESS

And if you don't have any batch ID (I was using LAST_EXECTION_DATE sigh type), you can use:

with EXECUTION_CONTROL(runid, object_name, last_execution_datetime, execution_status) as (
select column1, column2, to_timestamp(column3, 'mm/dd/yyyy hh:mi:ss'), column4  from values
(1,'HR_JOB','4/19/2022 22:58:48','SUCCESS'),
(2,'HR_JOB','4/19/2022 23:30:42','SUCCESS'),
(3,'HR_JOB','4/19/2022 23:32:23','SUCCESS'),
(4,'HR_JOB','4/19/2022 23:35:38','SUCCESS'),
(4,'HR_JOB','4/19/2022 23:35:38','FAILED'),
(5,'HR_JOB','4/19/2022 23:37:58','FAILED')
)
select *
from (
    select * 
    from EXECUTION_CONTROL
    where object_name = 'HR_JOB'
    qualify count(distinct execution_status) 
        over (partition by runid) = 1 
)
where execution_status = 'SUCCESS'
qualify row_number() 
    over (order by last_execution_datetime desc) = 1;

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1