'SQL Query: CREATE a table with rows divided by month/year and COUNT the number of values WHERE '01/month/year' IS BETWEEN two date-columns

this is my first question here.

I have a problem in creating a complex query to group values based on if the first day of month/year falls in between two date columns.

here is an example of the table I have:

USER_ID START_DATE END_DATE
A 03/07/2020 31/07/2020
A 05/06/2020 03/07/2020
A 08/05/2020 05/06/2020
A 10/04/2020 08/05/2020
B 13/02/2020 12/03/2020
B 16/01/2020 13/02/2020
C 22/05/2020 19/06/2020
C 24/04/2020 22/05/2020
D 25/09/2020 23/10/2020
D 28/08/2020 25/09/2020
D 31/07/2020 28/08/2020
D 03/07/2020 31/07/2020
D 05/06/2020 03/07/2020
E 25/11/2020 23/12/2020
E 28/10/2020 25/11/2020
E 30/09/2020 28/10/2020
F 14/2/2020 13/3/2020
F 17/1/2020 14/2/2020
F 20/12/2019 17/1/2020
F 22/11/2019 20/12/2019
G 7/11/2020 5/12/2020
G 10/10/2020 7/11/2020

and I wish to have something like that:

YEAR MONTH COUNT(DISTINCT USER_ID)
2019 11 0
2019 12 1
2020 1 1
2020 2 2
2020 3 2
2020 4 0
2020 5 2
2020 6 2
2020 7 2
2020 8 1
2020 9 1
2020 10 2
2020 11 2
2020 12 2

For instance, in Feb 2020 user "B" and user "F" had a range of dates that included the date 01/Feb/2020 (the condition is true for:

USER_ID START_DATE END_DATE
B 16/01/2020 13/02/2020

and for:

USER_ID START_DATE END_DATE
F 17/1/2020 14/2/2020

...so the count will be 2.

Do you know any way to do it in SQL (or Ruby)?

Thanks a lot!



Solution 1:[1]

Try this :

WITH m AS
( SELECT generate_series(min(date_trunc('month', start_date)), max(end_date), '1 month') :: date AS month
    FROM my_table AS t
)
SELECT to_char(m.month, 'YYYY') AS year
     , to_char(m.month, 'MM') AS month
     , count(DISTINCT t.user_id) AS "count(distinct user_id)"
  FROM my_table AS t
 RIGHT JOIN m
    ON daterange(t.start_date, t.end_date) @> m.month
 GROUP BY m.month
 ORDER BY m.month

The first query "m" calculates the list of months that cover the start_date and end_date of my_table.

The second query join my_table with the resulting table "m" in order to select all the users whose interval daterange(start_date, end_date) contains the 1st day of the month (see the manual).

Then the rows are grouped by m.month and the number of distinct user_id per month is calculated with the count(DISTINCT user_id) aggregate function (see the manual).

Finally the RIGHT JOIN clause allows to select the months with no corresponding user_id in my_table (see the manual).

See the test result in dbfiddle.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Edouard