'joining a series to dimensional data

I am trying to create a projection for churn over 12 months by segment. My data is only populated to 4 months. At this point I'm not worried about the projections since they are just calculations. I'm trying to structure a table so I can make those projections later on. I need the table to output a series of numbers from 1-12 for each segment where the first 4 series (1-4) populate the existing data and the remaining series (5-12) are null for each segment.

CREATE TABLE series (series integer);
INSERT INTO series (series) VALUES
(1),
(2),
(3),
(4),
(5),
(6),
(7),
(8),
(9),
(10),
(11),
(12);



CREATE TABLE agg_user_data (cohort_age integer, segment text, churn float);
INSERT INTO agg_user_data (cohort_age, segment, churn) VALUES
(1,'NA',1),
(2,'NA',.98),
(3,'NA',.96),
(4,'NA',.93),
(1,'EUROPE',1),
(2,'EUROPE',.97),
(3,'EUROPE',.95),
(4,'EUROPE',.92);

The output I'm trying to create would be something like:

series       | segment    | churn
-------------+------------+-------
1            | NA         | 1
2            | NA         | .98
3            | NA         | .96
4            | NA         | .93
5            | NULL       | NULL
6            | NULL       | NULL
....
1            | EUROPE     | 1
2            | EUROPE     | .97
3            | EUROPE     | .95
4            | EUROPE     | .92
5            | NULL       | NULL
6            | NULL       | NULL
....

I've tried creating a series like this:

with series as (
select
    (row_number() over (order by 1)) as series_age
from
    some_other_table
    order by 1 
limit 12
)

select series.series
       agg_user_data.segment
from series
left join agg_user_data on series.series = agg_user_data.cohort_age
order by 2,1

But it only pulls in series 1-4 for each segment and then 5-12 as stand alone NULL rows but not for each segment.

series       | segment    | churn
-------------+------------+-------
1            | NA         | 1
2            | NA         | .98
3            | NA         | .96
4            | NA         | .93
1            | EUROPE     | 1
2            | EUROPE     | .97
3            | EUROPE     | .95
4            | EUROPE     | .92
5            | NULL       | NULL
6            | NULL       | NULL
7            | NULL       | NULL
8            | NULL       | NULL
9            | NULL       | NULL
10           | NULL       | NULL
11           | NULL       | NULL
12           | NULL       | NULL    

When I filter for just 'NA' it works as I would hope but when I remove the filter to include all segments, then I get output above.



Solution 1:[1]

It's normal behaviour of left join. You joined 12 series in "series" with 4 series in "agg_user_segment" and postgres found 8 "inner join" result records and added 8 not joined records from series. Well it works without errors, but you intended to got all records from series multiplied by distinct segments in agg_user_data. As I understand your question. You can do that by following query:

with
disinct_series as (
  select distinct agg_user_data.segment
    from agg_user_data
),
multiplied_series as (
  select series.series,
         disinct_series.segment
    from series
    cross join disinct_series
)
select multiplied_series.series,
       agg_user_data.segment
from multiplied_series
left join agg_user_data 
       on multiplied_series.series = agg_user_data.cohort_age
      and multiplied_series.segment = agg_user_data.segment
order by 1, 2 

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Andrew