'SQL BigQuery - COUNTIF with criteria from current row and partitioned rows

I'm running this line of code:

COUNTIF(
    type = "credit"
    AND 
    DATETIME_DIFF(credit_window_end, start_at_local_true_01, DAY) BETWEEN 0 and 5
    )
    over (partition by case_id order by start_at_local_true_01
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
as credit_count_per_case_id_in_future_and_within_credit_window,

And I'm getting this

Row case_id start_at_local_true_01 type credit_window_end credit_count_per_case_id_in_future_and_within_credit_window
1 12123 2022-02-01 11:00:00 null 2022-02-06 11:00:00 0
2 12123 2022-02-01 11:15:00 run null 0
3 12123 2022-02-01 11:21:00 jump 2022-02-06 11:21:00 0
4 12123 2022-02-04 11:31:00 run 2022-02-09 11:31:00 0
5 12123 2022-02-05 11:34:00 jump null 0
6 12123 2022-02-08 12:38:00 credit null 0
7 12555 2022-02-01 11:15:00 null null 0

But I want this

Row case_id start_at_local_true_01 type credit_window_end credit_count_per_case_id_in_future_and_within_credit_window
1 12123 2022-02-01 11:00:00 null 2022-02-06 11:00:00 0
2 12123 2022-02-01 11:15:00 run null 0
3 12123 2022-02-01 11:21:00 jump 2022-02-06 11:21:00 0
4 12123 2022-02-04 11:31:00 run 2022-02-09 11:31:00 1
5 12123 2022-02-05 11:34:00 jump null 0
6 12123 2022-02-08 12:38:00 credit null 0
7 12555 2022-02-01 11:15:00 null null 0

The 4th row should be 1 because (from the 6th row) credit = credit AND DATETIMEDIFF(2022-02-08T12:38:00, 2022-02-04 11:31:00, DAY) between 0 and 5

The calculation within the cell would look like this:

COUNTIF(
    run = credit    AND DATETIMEDIFF(2022-02-04 11:31:00, 2022-02-04T11:31:00, DAY ) between 0 and 5
    jump = credit   AND DATETIMEDIFF(2022-02-04 11:31:00, 2022-02-05T11:34:00, DAY ) between 0 and 5
    credit = credit AND DATETIMEDIFF(2022-02-04 11:31:00, 2022-02-08T12:38:00, DAY ) between 0 and 5
)

COUNTIF(
    false and false
    false and false
    true and true
)

COUNTIF(
    0
    0
    1
)

I think I know why, but I don't know how to fix it.

It's because the DATETIME_DIFF function is taking both values from the same row (from each partitioned row). The second element should stay the same (start_at_local_true_01). But I want the first element to be fixed to the CURRENT ROW's credit_window_end (not each partitioned row's credit_window_end).

This is my code so far (including sample table):

with data_table as(
    select * FROM UNNEST(ARRAY<STRUCT<
    case_id INT64, start_at_local_true_01 DATETIME, type STRING, credit_window_end DATETIME>>
    [
         (12123, DATETIME("2022-02-01 11:00:00"), null,     DATETIME("2022-02-06 11:00:00"))
        ,(12123, DATETIME("2022-02-01 11:15:00"), 'run',    null)
        ,(12123, DATETIME("2022-02-01 11:21:00"), 'jump',   DATETIME("2022-02-06 11:21:00"))
        ,(12123, DATETIME("2022-02-04 11:31:00"), 'run',    DATETIME("2022-02-09 11:31:00"))
        ,(12123, DATETIME("2022-02-05 11:34:00"), 'jump',   null)
        ,(12123, DATETIME("2022-02-08 12:38:00"), 'credit', null)
        ,(12555, DATETIME("2022-02-01 11:15:00"), null,     null)
    ]
    )
)

select 
data_table.*,

COUNTIF(
    type = "credit"
    )
    over (partition by case_id order by start_at_local_true_01
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
as credit_count_per_case_id_in_future,

COUNTIF(
    type = "credit"
    AND 
    DATETIME_DIFF(start_at_local_true_01, credit_window_end, DAY) BETWEEN 0 and 5
    )
    over (partition by case_id order by start_at_local_true_01
    ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
as credit_count_per_case_id_in_future_and_within_credit_window,

--does not work. does not even run
-- DATETIME_DIFF(
-- credit_window_end,
-- array_agg(
--         IFNULL(start_at_local_true_01,DATETIME("2000-01-01 00:00:00"))
--         )
--     over (partition by case_id order by start_at_local_true_01 asc
--     ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
-- , DAY
-- )
-- as credit_count_per_case_id_in_future_and_within_credit_window_02,

from data_table

Thanks for the help!



Solution 1:[1]

As confirmed by @Phil in the comments, this was solved by changing the window to:

over (partition by case_id order by UNIX_MILLIS(TIMESTAMP(start_at_local_true_01)) RANGE BETWEEN CURRENT ROW AND 432000000 FOLLOWING)

Posting the answer as community wiki for the benefit of the community that might encounter this use case in the future.

Feel free to edit this answer for additional information.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1