'Cumulative calculation in window functions

I have a question for window functions in Postgresql.

I have a table like the following where:

  • each ID is unique
  • first_seen is the first date the ID sees a message
  • even_time is when the ID converts
  • cohort_week is the first day of the week related to first_seen
  • range is constructed based on days_to_conv: 1D - if days_to_conv is within 1 day, or 30 days or 90 days or 360 days.
ID first_seen event_time days_to_conv cohort_week range
1448 2018-08-20 2018-08-21 1 2018-08-20 1D
1207 2018-08-20 2018-08-21 1 2018-08-20 1D
1528 2018-08-25 NULL NULL 2018-08-20 1D
1985 2018-08-22 2018-08-23 1 2018-08-20 1D
1971 2018-08-23 2018-08-24 1 2018-08-20 1D
1661 2018-08-25 NULL NULL 2018-08-20 1D
1090 2018-08-23 NULL NULL 2018-08-20 1D
1525 2018-08-25 2018-08-27 2 2018-08-20 30D
1973 2018-08-25 2018-09-09 15 2018-08-20 30D
1378 2018-08-20 NULL NULL 2018-08-20 30D
1194 2018-08-22 2018-09-07 16 2018-08-20 30D
1651 2018-08-25 NULL NULL 2018-08-20 30D
1310 2018-08-23 2018-11-17 86 2018-08-20 90D
1375 2018-08-20 2018-11-04 76 2018-08-20 90D
1631 2018-08-25 2018-12-05 102 2018-08-20 360D
1029 2018-08-22 NULL NULL 2018-08-20 360D
1030 2018-08-25 NULL NULL 2018-08-20 360D
1253 2018-08-20 NULL NULL 2018-08-20 360D

I want to create a calculation that creates a rolling cumulative ratio like the following, where:

  • size is the amount of ID in that week
  • converted is the sum of NON NULL entry in that range
  • ratio is the ration between converted / tot_size
  • ... and cum_ratio is the cumulative sum of ratio
week tot_size range converted ratio cum_ratio
2018-08-20 18 1D 4 22.22% 22.22%
2018-08-20 18 30D 3 16.67% 38.89%
2018-08-20 18 90D 2 11.11% 50.00%
2018-08-20 18 360D 1 5.56% 55.56%

I am having a hard time in understanding how to create cum_ratio for each range value.

I assume I have to apply a window function but I don't understand how to construct it.



Solution 1:[1]

The range column type is problematic because it is difficult to sort properly. You need this in the definition of a window function, and you have to use a rather clunky expression for it.

select 
    cohort_week,
    total as tot_size,
    range, 
    converted,
    round(converted::numeric/ total* 100, 2) as ratio,
    round(sum(converted::numeric/ total* 100) over w, 2) as cum_ratio
from (
    select 
        cohort_week, 
        count(*),
        (
            select count(*) 
            from my_table s 
            where s.cohort_week = t.cohort_week
        ) as total,
        range, 
        count(event_time) as converted
    from my_table t
    group by cohort_week, range, total
    ) s
window w as (partition by cohort_week order by left(range, -1)::int)
order by cohort_week, left(range, -1)::int

Db<>fiddle.

Solution 2:[2]

maybe some final rounding or type casting may be done :

  with _temp as (
  select 
  distinct
  cohort_week::date,
  sum(1) over (PARTITION by extract ('week' from first_seen), extract('year' from first_seen)) as tot_size,
  first_value(range) over (PARTITION by range),
  sum(1) FILTER(where days_to_conv is not null) over (PARTITION by range) as converted
  from test
  )
  
  select 
  *,
  converted::float / tot_size as ratio,
  sum(converted::float / tot_size) over ( order by cohort_week rows BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
  from _temp order by cohort_week

https://www.db-fiddle.com/f/q3MF5Uanrgme5HFQcP2mK9/0

Solution 3:[3]

On my side, I used the partitionning twice. Once for keeping the tot_size along all the results and the second time to compute the cumulative ratio.

Finally, I get this

select
    cohort_week,
    tot_size,
    range,
    sum(days) as converted,
    ((sum(days)::float/tot_size)*100) ratio,
    sum((sum(days)::float/tot_size)*100) OVER (ORDER by sum(days) DESC) cum_ratio
FROM (
  select
    cohort_week,
    id,
    count(id) OVER (PARTITION by cohort_week) as tot_size,
    range,
    case
        when days_to_conv is null
             then 0
             else 1
        end days
    from my_table
    group by cohort_week, range,id
) T
group by cohort_week, range, tot_size;

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 klin
Solution 2 FatFreddy
Solution 3 Frédéric Praca