'"Year-on-year" with window functions
I'm doing year on year analysis by doing joins. I'm joining the same table for every year but since I'm using another tool to build my SQL its not so 'dynamic'. If I could solve this with window functions it would be better. So any suggestion is appreciated :D
The idea is to do this by hour. That is, I want to compare sales lets say from 2022-04-05 hour 8 with sales from 2021-04-05 hour 8 and from 2020-04-05 hour 8.
My data is aggregated by hour:
| Store | Timestamp | Sales |
|---|---|---|
| 1 | 2019-04-05T08:00:00Z | 10000 |
| 1 | 2020-04-05T08:00:00Z | 12000 |
| 1 | 2021-04-05T08:00:00Z | 15000 |
| 1 | 2022-04-05T08:00:00Z | 20000 |
| 2 | 2019-04-05T08:00:00Z | 13000 |
| 2 | 2020-04-05T08:00:00Z | 16000 |
| 2 | 2021-04-05T08:00:00Z | 19000 |
| 2 | 2022-04-05T08:00:00Z | 22000 |
Desired result (order may begin with this year) timestamps are not needed. I added them just to clarify:
| Store | Timestamp_1 | Sales_1 | Timestamp_2 | Sales_2 | Timestamp_3 | Sales_3 |
|---|---|---|---|---|---|---|
| 1 | 2019-04-05T08:00:00Z | 10000 | 2020-04-05T08:00:00Z | 12000 | 2021-04-05T08:00:00Z | 15000 |
| 2 | 2019-04-05T08:00:00Z | 13000 | 2020-04-05T08:00:00Z | 16000 | 2021-04-05T08:00:00Z | 19000 |
Any ideas? Thanks in Advance
Solution 1:[1]
Not really your answer but if you only have one days worth of hours
SELECT
store
,hour(date)
,array_agg(object_construct(date::text, sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2
ORDER BY 1,2;
gives:
| STORE | HOUR(DATE) | HOUR_HISTORY |
|---|---|---|
| 1 | 8 | [ { "2019-04-05 08:00:00.000": 10000 }, { "2020-04-05 08:00:00.000": 12000 }, { "2021-04-05 08:00:00.000": 15000 }, { "2022-04-05 08:00:00.000": 20000 } ] |
| 2 | 8 | [ { "2019-04-05 08:00:00.000": 13000 }, { "2020-04-05 08:00:00.000": 16000 }, { "2021-04-05 08:00:00.000": 19000 }, { "2022-04-05 08:00:00.000": 22000 } ] |
Thus:
SELECT store
,hour_history[0].date::timestamp as Timestamp_1
,hour_history[0].sales::number as Sales_1
,hour_history[1].date::timestamp as Timestamp_2
,hour_history[1].sales::number as Sales_2
,hour_history[2].date::timestamp as Timestamp_3
,hour_history[2].sales::number as Sales_3
FROM (
SELECT
store
,hour(date)
,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2
)
ORDER BY 1;
does give:
| STORE | TIMESTAMP_1 | SALES_1 | TIMESTAMP_2 | SALES_2 | TIMESTAMP_3 | SALES_3 |
|---|---|---|---|---|---|---|
| 1 | 2019-04-05 08:00:00.000 | 10,000 | 2020-04-05 08:00:00.000 | 12,000 | 2021-04-05 08:00:00.000 | 15,000 |
| 2 | 2019-04-05 08:00:00.000 | 13,000 | 2020-04-05 08:00:00.000 | 16,000 | 2021-04-05 08:00:00.000 | 19,000 |
If you have many months, day, hours worth of data this works for the inner loop:
SELECT
store
,month(date)
,day(date)
,hour(date)
,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2,3,4
AKA:
WITH data_table AS (
SELECT * FROM VALUES
(1,'2019-04-05T08:00:00Z'::timestamp,10000),
(1,'2020-04-05T08:00:00Z'::timestamp,12000),
(1,'2021-04-05T08:00:00Z'::timestamp,15000),
(1,'2022-04-05T08:00:00Z'::timestamp,20000),
(1,'2019-03-05T08:00:00Z'::timestamp,10001),
(1,'2020-03-05T08:00:00Z'::timestamp,12001),
(1,'2021-03-05T08:00:00Z'::timestamp,15001),
(1,'2022-03-05T08:00:00Z'::timestamp,20001),
(1,'2019-04-04T08:00:00Z'::timestamp,10002),
(1,'2020-04-04T08:00:00Z'::timestamp,12002),
(1,'2021-04-04T08:00:00Z'::timestamp,15002),
(1,'2022-04-04T08:00:00Z'::timestamp,20002),
(2,'2019-04-05T08:00:00Z'::timestamp,13000),
(2,'2020-04-05T08:00:00Z'::timestamp,16000),
(2,'2021-04-05T08:00:00Z'::timestamp,19000),
(2,'2022-04-05T08:00:00Z'::timestamp,22000)
t(store, date, sales)
)
SELECT store
,hour_history[0].date::timestamp as Timestamp_1
,hour_history[0].sales::number as Sales_1
,hour_history[1].date::timestamp as Timestamp_2
,hour_history[1].sales::number as Sales_2
,hour_history[2].date::timestamp as Timestamp_3
,hour_history[2].sales::number as Sales_3
FROM (
SELECT
store
,month(date)
,day(date)
,hour(date)
,array_agg(object_construct('date', date::text, 'sales', sales)) within group (order by date) as hour_history
FROM data_table
GROUP BY 1,2,3,4
)
ORDER BY 1;
gives:
| STORE | TIMESTAMP_1 | SALES_1 | TIMESTAMP_2 | SALES_2 | TIMESTAMP_3 | SALES_3 |
|---|---|---|---|---|---|---|
| 1 | 2019-04-05 08:00:00.000 | 10,000 | 2020-04-05 08:00:00.000 | 12,000 | 2021-04-05 08:00:00.000 | 15,000 |
| 1 | 2019-03-05 08:00:00.000 | 10,001 | 2020-03-05 08:00:00.000 | 12,001 | 2021-03-05 08:00:00.000 | 15,001 |
| 1 | 2019-04-04 08:00:00.000 | 10,002 | 2020-04-04 08:00:00.000 | 12,002 | 2021-04-04 08:00:00.000 | 15,002 |
| 2 | 2019-04-05 08:00:00.000 | 13,000 | 2020-04-05 08:00:00.000 | 16,000 | 2021-04-05 08:00:00.000 | 19,000 |
What you will note is you example has 4 years of data, and you are throwing away to 2022 data.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
