'Rolling average over the previous 15 days
ANSI-SQL/Postgres-sql
I'm a bit confused about the behaviour of aggregate functions with the OVER() clause. I'm trying to do the postgres exercises here: 
https://pgexercises.com/questions/aggregates/rollingavg.html
Here is the table definition:
Question: For each day in August 2012, calculate a rolling average of total revenue over the previous 15 days. Output should contain date and revenue columns, sorted by the date.
and I wrote the following query:
select starttime::date, sum(slots * case when memid=0 then guestcost
                        else membercost end) over (order by starttime::date asc
                                                   range BETWEEN INTERVAL '14 day' 
                                                   PRECEDING AND 
                                                   CURRENT ROW)::decimal/15 revenue
from cd.bookings B inner join cd.facilities F
on B.facid = F.facid
where starttime >= '2012-08-01' and starttime <= '2012-08-31' 
group by starttime::date
order by starttime::date
When I run this, I get the error: ERROR: column "b.slots" must appear in the GROUP BY clause or be used in an aggregate function. When I do add slots into the group by, then I'm asked to add memid, guestcost, membercost, as well.
I don't completely understand why I need to add these to the group by function when they are a part of the avg calculation, especially when I didn't have to add them here: https://pgexercises.com/questions/aggregates/facrev.html My query for this particular question:
Produce a list of facilities along with their total revenue.
works correctly:
select name, sum(slots * case when
                 memid = 0 then guestcost
                 else membercost
                end) revenue
from cd.facilities A inner join cd.bookings B
on A.facid = B.facid
group by A.facid
order by revenue
But as suggested for the first question by the query engine, I add all the asked columns to the group by and I got multiple rows:
starttime   revenue
2012-08-01  83.1333333333333333
2012-08-01  83.1333333333333333
...
2012-08-02  137.7666666666666667
2012-08-02  137.7666666666666667
2012-08-02  137.7666666666666667
...
because I think it's providing results for all the starttime values separately, but the results are generated by date.
I know there is an answer under there in the exercises, but I'm more concerned about where my thinking is wrong and if I can do it using the method I have in my head. The answer mentioned there is using generate_series function or a view. Can range BETWEEN INTERVAL '14 day' PRECEDING AND CURRENT ROW be used to calculate the correct results? If yes, what am I doing wrong?
Update:
select date, revenue, sum(revenue) over (order by date 
                  RANGE between INTERVAL '14 day' PRECEDING AND CURRENT ROW)
                  sum_revenue, sum(revenue) over (order by date 
                  RANGE between INTERVAL '14 day' PRECEDING AND CURRENT ROW)::decimal/15
                  avg_revenue
from
(select distinct starttime::date date, sum(slots* case when memid=0 then guestcost
                  else membercost
               end) over (order by starttime::date) revenue
from cd.bookings B inner join cd.facilities F
on B.facid = F.facid
where starttime >= '2012-07-15' and starttime <= '2012-08-31'
order by starttime::date) subq
Having range by default in inner query ensures one result across the date. Doing a distinct gives one row per date. And then I'm applying the outer query to do the rolling date calculations. This is my progress so far. This doesn't match the results on the exercise link, but produces these results:
date        revenue sum_revenue    avg_revenue
2012-07-15  570     570            38.0000000000000000
2012-07-16  1005.0  1575.0         105.0000000000000000
2012-07-17  1712.0  3287.0         219.1333333333333333
2012-07-18  2379.0  5666.0         377.7333333333333333
2012-07-19  3519.5  9185.5         612.3666666666666667
2012-07-20  4348.5  13534.0        902.2666666666666667
2012-07-21  5532.0  19066.0        1271.0666666666666667
2012-07-22  6556.0  25622.0        1708.1333333333333333
2012-07-23  7366.0  32988.0        2199.2000000000000000
2012-07-24  8741.0  41729.0        2781.9333333333333333
2012-07-25  10212.0 51941.0        3462.7333333333333333
2012-07-26  11137.0 63078.0        4205.2000000000000000
2012-07-27  12566.0 75644.0        5042.9333333333333333
2012-07-28  13756.0 89400.0        5960.0000000000000000
2012-07-29  14750.0 104150.0       6943.3333333333333333
2012-07-30  15981.5 119561.5       7970.7666666666666667
2012-07-31  17317.5 135874.0       9058.2666666666666667
2012-08-01  18614.5 152776.5       10185.1000000000000000
2012-08-02  19674.0 170071.5       11338.100000000000
2012-08-03  20963.0 187515.0       12501.000000000000
2012-08-04  22009.0 205175.5       13678.366666666667
2012-08-05  22946.0 222589.5       14839.300000000000
2012-08-06  24337.0 240370.5       16024.700000000000
2012-08-07  25109.0 258113.5       17207.566666666667
2012-08-08  26330.0 275702.5       18380.166666666667
2012-08-09  27499.0 292989.5       19532.633333333333
2012-08-10  28762.5 310615.0       20707.666666666667
2012-08-11  30215.5 328264.5       21884.300000000000
2012-08-12  31691.0 346199.5       23079.966666666667
2012-08-13  33020.0 364469.5       24297.966666666667
2012-08-14  34693.5 383181.5       25545.433333333333
2012-08-15  36429.0 402293.0       26819.533333333333
2012-08-16  37833.0 421511.5       28100.766666666667
2012-08-17  39541.0 441378.5       29425.233333333333
2012-08-18  41569.0 461984.5       30798.966666666667
2012-08-19  43100.0 483075.5       32205.033333333333
2012-08-20  44352.0 504481.5       33632.100000000000
2012-08-21  46092.0 526236.5       35082.433333333333
2012-08-22  48204.5 549332.0       36622.133333333333
2012-08-23  49839.5 572841.5       38189.433333333333
2012-08-24  51384.0 596726.5       39781.766666666667
2012-08-25  52988.0 620952.0       41396.800000000000
2012-08-26  54683.5 645420.0       43028.000000000000
2012-08-27  56582.5 670311.5       44687.433333333333
2012-08-28  58325.0 695616.5       46374.433333333333
2012-08-29  59963.0 720886.0       48059.066666666667
2012-08-30  61298.0 745755.0       49717.000000000000
when I check in excel, my results are consistent. I also get the same value for the average fields on using the average function: avg(revenue) over (order by date RANGE between INTERVAL '14 day' PRECEDING AND CURRENT ROW) avg_revenue
Solution 1:[1]
It seems to me that your approach assumes that there will be bookings for every day in August - that's not an assumption I would be willing to make. I would start by creating a temp set with every day in August that can be used to join to a daily summary of revenue.
With Recursive AugustDays AS (
     select '2012-07-17'::date as daysend, ('2012-07-17'::date - interval '14 days')::date as daysstart
     UNION ALL
     Select daysend + 1, daysstart + 1
     FROM AugustDays
     WHERE daysend<'2012-08-31`)
Then I would use the August set to summarize my revenue by day - zeros for unbooked days
With DailyRevenue AS (
    Select daysend as day
         , NullIf(sum(slots*(Case When memid=0 then guestcost else membercost end)),0) as revenue
From AugustDays A Left Outer Join (cd.bookings B inner join cd.facilities F
                                    on B.facid = F.facid)
                  On starttime:date=daysend
Group By daysend)
Finally, I would join these two together to produce rolling average...
Select daysend as day, Avg(revenue) as Rolling15Day
From AugustDays Left Outer Join DailyRevenue On day between daysstart and daysend
Where daysend>='2012-08-01'
Group By daysend
Order By daysend
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source | 
|---|---|
| Solution 1 | 
