'SQL Query: CREATE a table with rows divided by month/year and COUNT the number of values WHERE '01/month/year' IS BETWEEN two date-columns
this is my first question here.
I have a problem in creating a complex query to group values based on if the first day of month/year falls in between two date columns.
here is an example of the table I have:
| USER_ID | START_DATE | END_DATE |
|---|---|---|
| A | 03/07/2020 | 31/07/2020 |
| A | 05/06/2020 | 03/07/2020 |
| A | 08/05/2020 | 05/06/2020 |
| A | 10/04/2020 | 08/05/2020 |
| B | 13/02/2020 | 12/03/2020 |
| B | 16/01/2020 | 13/02/2020 |
| C | 22/05/2020 | 19/06/2020 |
| C | 24/04/2020 | 22/05/2020 |
| D | 25/09/2020 | 23/10/2020 |
| D | 28/08/2020 | 25/09/2020 |
| D | 31/07/2020 | 28/08/2020 |
| D | 03/07/2020 | 31/07/2020 |
| D | 05/06/2020 | 03/07/2020 |
| E | 25/11/2020 | 23/12/2020 |
| E | 28/10/2020 | 25/11/2020 |
| E | 30/09/2020 | 28/10/2020 |
| F | 14/2/2020 | 13/3/2020 |
| F | 17/1/2020 | 14/2/2020 |
| F | 20/12/2019 | 17/1/2020 |
| F | 22/11/2019 | 20/12/2019 |
| G | 7/11/2020 | 5/12/2020 |
| G | 10/10/2020 | 7/11/2020 |
and I wish to have something like that:
| YEAR | MONTH | COUNT(DISTINCT USER_ID) |
|---|---|---|
| 2019 | 11 | 0 |
| 2019 | 12 | 1 |
| 2020 | 1 | 1 |
| 2020 | 2 | 2 |
| 2020 | 3 | 2 |
| 2020 | 4 | 0 |
| 2020 | 5 | 2 |
| 2020 | 6 | 2 |
| 2020 | 7 | 2 |
| 2020 | 8 | 1 |
| 2020 | 9 | 1 |
| 2020 | 10 | 2 |
| 2020 | 11 | 2 |
| 2020 | 12 | 2 |
For instance, in Feb 2020 user "B" and user "F" had a range of dates that included the date 01/Feb/2020 (the condition is true for:
| USER_ID | START_DATE | END_DATE |
|---|---|---|
| B | 16/01/2020 | 13/02/2020 |
and for:
| USER_ID | START_DATE | END_DATE |
|---|---|---|
| F | 17/1/2020 | 14/2/2020 |
...so the count will be 2.
Do you know any way to do it in SQL (or Ruby)?
Thanks a lot!
Solution 1:[1]
Try this :
WITH m AS
( SELECT generate_series(min(date_trunc('month', start_date)), max(end_date), '1 month') :: date AS month
FROM my_table AS t
)
SELECT to_char(m.month, 'YYYY') AS year
, to_char(m.month, 'MM') AS month
, count(DISTINCT t.user_id) AS "count(distinct user_id)"
FROM my_table AS t
RIGHT JOIN m
ON daterange(t.start_date, t.end_date) @> m.month
GROUP BY m.month
ORDER BY m.month
The first query "m" calculates the list of months that cover the start_date and end_date of my_table.
The second query join my_table with the resulting table "m" in order to select all the users whose interval daterange(start_date, end_date) contains the 1st day of the month (see the manual).
Then the rows are grouped by m.month and the number of distinct user_id per month is calculated with the count(DISTINCT user_id) aggregate function (see the manual).
Finally the RIGHT JOIN clause allows to select the months with no corresponding user_id in my_table (see the manual).
See the test result in dbfiddle.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Edouard |
