'Cumulative calculation in window functions
I have a question for window functions in Postgresql.
I have a table like the following where:
- each ID is unique
first_seenis the first date theIDsees a messageeven_timeis when theIDconvertscohort_weekis the first day of the week related tofirst_seenrangeis constructed based ondays_to_conv: 1D - ifdays_to_convis within 1 day, or 30 days or 90 days or 360 days.
| ID | first_seen | event_time | days_to_conv | cohort_week | range |
|---|---|---|---|---|---|
| 1448 | 2018-08-20 | 2018-08-21 | 1 | 2018-08-20 | 1D |
| 1207 | 2018-08-20 | 2018-08-21 | 1 | 2018-08-20 | 1D |
| 1528 | 2018-08-25 | NULL | NULL | 2018-08-20 | 1D |
| 1985 | 2018-08-22 | 2018-08-23 | 1 | 2018-08-20 | 1D |
| 1971 | 2018-08-23 | 2018-08-24 | 1 | 2018-08-20 | 1D |
| 1661 | 2018-08-25 | NULL | NULL | 2018-08-20 | 1D |
| 1090 | 2018-08-23 | NULL | NULL | 2018-08-20 | 1D |
| 1525 | 2018-08-25 | 2018-08-27 | 2 | 2018-08-20 | 30D |
| 1973 | 2018-08-25 | 2018-09-09 | 15 | 2018-08-20 | 30D |
| 1378 | 2018-08-20 | NULL | NULL | 2018-08-20 | 30D |
| 1194 | 2018-08-22 | 2018-09-07 | 16 | 2018-08-20 | 30D |
| 1651 | 2018-08-25 | NULL | NULL | 2018-08-20 | 30D |
| 1310 | 2018-08-23 | 2018-11-17 | 86 | 2018-08-20 | 90D |
| 1375 | 2018-08-20 | 2018-11-04 | 76 | 2018-08-20 | 90D |
| 1631 | 2018-08-25 | 2018-12-05 | 102 | 2018-08-20 | 360D |
| 1029 | 2018-08-22 | NULL | NULL | 2018-08-20 | 360D |
| 1030 | 2018-08-25 | NULL | NULL | 2018-08-20 | 360D |
| 1253 | 2018-08-20 | NULL | NULL | 2018-08-20 | 360D |
I want to create a calculation that creates a rolling cumulative ratio like the following, where:
sizeis the amount of ID in that weekconvertedis the sum ofNON NULLentry in that rangeratiois the ration betweenconverted/tot_size- ... and
cum_ratiois the cumulative sum ofratio
| week | tot_size | range | converted | ratio | cum_ratio |
|---|---|---|---|---|---|
| 2018-08-20 | 18 | 1D | 4 | 22.22% | 22.22% |
| 2018-08-20 | 18 | 30D | 3 | 16.67% | 38.89% |
| 2018-08-20 | 18 | 90D | 2 | 11.11% | 50.00% |
| 2018-08-20 | 18 | 360D | 1 | 5.56% | 55.56% |
I am having a hard time in understanding how to create cum_ratio for each range value.
I assume I have to apply a window function but I don't understand how to construct it.
Solution 1:[1]
The range column type is problematic because it is difficult to sort properly. You need this in the definition of a window function, and you have to use a rather clunky expression for it.
select
cohort_week,
total as tot_size,
range,
converted,
round(converted::numeric/ total* 100, 2) as ratio,
round(sum(converted::numeric/ total* 100) over w, 2) as cum_ratio
from (
select
cohort_week,
count(*),
(
select count(*)
from my_table s
where s.cohort_week = t.cohort_week
) as total,
range,
count(event_time) as converted
from my_table t
group by cohort_week, range, total
) s
window w as (partition by cohort_week order by left(range, -1)::int)
order by cohort_week, left(range, -1)::int
Solution 2:[2]
maybe some final rounding or type casting may be done :
with _temp as (
select
distinct
cohort_week::date,
sum(1) over (PARTITION by extract ('week' from first_seen), extract('year' from first_seen)) as tot_size,
first_value(range) over (PARTITION by range),
sum(1) FILTER(where days_to_conv is not null) over (PARTITION by range) as converted
from test
)
select
*,
converted::float / tot_size as ratio,
sum(converted::float / tot_size) over ( order by cohort_week rows BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
from _temp order by cohort_week
Solution 3:[3]
On my side, I used the partitionning twice. Once for keeping the tot_size along all the results and the second time to compute the cumulative ratio.
Finally, I get this
select
cohort_week,
tot_size,
range,
sum(days) as converted,
((sum(days)::float/tot_size)*100) ratio,
sum((sum(days)::float/tot_size)*100) OVER (ORDER by sum(days) DESC) cum_ratio
FROM (
select
cohort_week,
id,
count(id) OVER (PARTITION by cohort_week) as tot_size,
range,
case
when days_to_conv is null
then 0
else 1
end days
from my_table
group by cohort_week, range,id
) T
group by cohort_week, range, tot_size;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | klin |
| Solution 2 | FatFreddy |
| Solution 3 | Frédéric Praca |
