'joining a series to dimensional data
I am trying to create a projection for churn over 12 months by segment. My data is only populated to 4 months. At this point I'm not worried about the projections since they are just calculations. I'm trying to structure a table so I can make those projections later on. I need the table to output a series of numbers from 1-12 for each segment where the first 4 series (1-4) populate the existing data and the remaining series (5-12) are null for each segment.
CREATE TABLE series (series integer);
INSERT INTO series (series) VALUES
(1),
(2),
(3),
(4),
(5),
(6),
(7),
(8),
(9),
(10),
(11),
(12);
CREATE TABLE agg_user_data (cohort_age integer, segment text, churn float);
INSERT INTO agg_user_data (cohort_age, segment, churn) VALUES
(1,'NA',1),
(2,'NA',.98),
(3,'NA',.96),
(4,'NA',.93),
(1,'EUROPE',1),
(2,'EUROPE',.97),
(3,'EUROPE',.95),
(4,'EUROPE',.92);
The output I'm trying to create would be something like:
series | segment | churn
-------------+------------+-------
1 | NA | 1
2 | NA | .98
3 | NA | .96
4 | NA | .93
5 | NULL | NULL
6 | NULL | NULL
....
1 | EUROPE | 1
2 | EUROPE | .97
3 | EUROPE | .95
4 | EUROPE | .92
5 | NULL | NULL
6 | NULL | NULL
....
I've tried creating a series like this:
with series as (
select
(row_number() over (order by 1)) as series_age
from
some_other_table
order by 1
limit 12
)
select series.series
agg_user_data.segment
from series
left join agg_user_data on series.series = agg_user_data.cohort_age
order by 2,1
But it only pulls in series 1-4 for each segment and then 5-12 as stand alone NULL rows but not for each segment.
series | segment | churn
-------------+------------+-------
1 | NA | 1
2 | NA | .98
3 | NA | .96
4 | NA | .93
1 | EUROPE | 1
2 | EUROPE | .97
3 | EUROPE | .95
4 | EUROPE | .92
5 | NULL | NULL
6 | NULL | NULL
7 | NULL | NULL
8 | NULL | NULL
9 | NULL | NULL
10 | NULL | NULL
11 | NULL | NULL
12 | NULL | NULL
When I filter for just 'NA' it works as I would hope but when I remove the filter to include all segments, then I get output above.
Solution 1:[1]
It's normal behaviour of left join. You joined 12 series in "series" with 4 series in "agg_user_segment" and postgres found 8 "inner join" result records and added 8 not joined records from series. Well it works without errors, but you intended to got all records from series multiplied by distinct segments in agg_user_data. As I understand your question. You can do that by following query:
with
disinct_series as (
select distinct agg_user_data.segment
from agg_user_data
),
multiplied_series as (
select series.series,
disinct_series.segment
from series
cross join disinct_series
)
select multiplied_series.series,
agg_user_data.segment
from multiplied_series
left join agg_user_data
on multiplied_series.series = agg_user_data.cohort_age
and multiplied_series.segment = agg_user_data.segment
order by 1, 2
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Andrew |
