'How to reference fields from table created in sub-query's of large JOIN

I am writing a large query with many JOINs (shortened it in example here) and I am trying to reference values form other sub-queries but can't figure out how.

This is my example query:

DROP TABLE IF EXISTS breakdown;
CREATE TEMP TABLE breakdown AS 
SELECT * FROM 
  (
  SELECT COUNT(DISTINCT s_id) AS before, date_trunc('day', time) AS day FROM table_a
  WHERE  date_trunc('sec',earliest) < date_trunc('sec',time) GROUP BY day 
  ) 
JOIN
  (
  SELECT ROUND(before * 100.0 / total, 1) AS Percent_1, day
  FROM breakdown
  GROUP BY day
  ) USING (day)
JOIN 
  (
  SELECT COUNT(DISTINCT s_id) AS equal, date_trunc('day', time) AS day FROM table_a 
  WHERE   date_trunc('sec',earliest) = date_trunc('sec',time) GROUP BY day 
  ) USING (day)
JOIN 
  (
  SELECT COUNT(DISTINCT s_id) AS after, date_trunc('day', time) AS day FROM table_a 
  WHERE   date_trunc('sec',earliest) > date_trunc('sec',time) GROUP BY day 
  ) USING (day)
JOIN 
  (
  SELECT COUNT(DISTINCT s_id) AS total, date_trunc('day', earliest) AS day
  FROM first
  GROUP BY 2
 ) USING (day) 
ORDER BY day;


SELECT * FROM breakdown ORDER BY day;

The last query gives me the total and for each of the previous subqueries I want to get the percentages as well.

I found the code for getting the percentage (second JOIN) but I don't know how to reference the values from the other tables.

E.g. for getting the percentage from the first query I want to use the COUNT of the first query which I renamed before and then divide that by the COUNT of the last query which I renamed total (If there is an easier solution to do this i.e. get the percentage for each of the sub-queries please let me know), But I cant seem to find how to reference them. I tried adding AS x to the end of each subquery and calling by that (x.total) as well as trying to reference via the parent table (breakdown.total) but neither worked.

How can I do this without changing my table too much as it is a long table with a lot of sub-queries.

This is what my table looks like I would like to add percentage for each column This is what my table looks like I would like to add percentage for each column

Using redshift BTW.

Thanks



Solution 1:[1]

I'm a little confused by all that is going on as you drop table breakdown and then in the second subquery of the create table you reference breakdown. I suspect that there are some issues in the provided sample of SQL. Please update if there are issues.

For a number of these subqueries it looks like you are using a subquery where a case statement will do. In Redshift you don't want to scan the same table over and over if you can prevent it. For example if we look at the the 3rd and 4th subqueries you can replace these with one query. Also in these cases I like to use the DECODE() statement rather than CASE since it is more readable in these simple cases.

(
  SELECT COUNT(DISTINCT s_id) AS equal, date_trunc('day', time) AS day 
  FROM table_a 
  WHERE   date_trunc('sec',earliest) = date_trunc('sec',time) 
  GROUP BY day 
) USING (day)
JOIN 
(
  SELECT COUNT(DISTINCT s_id) AS after, date_trunc('day', time) AS day 
  FROM table_a 
  WHERE   date_trunc('sec',earliest) > date_trunc('sec',time) 
  GROUP BY day 
) 

Becomes:

(
  SELECT COUNT(DISTINCT DECODE(date_trunc('sec',earliest) = date_trunc('sec',time), true, s_id, NULL)) AS equal, 
    COUNT(DISTINCT DECODE(date_trunc('sec',earliest) > date_trunc('sec',time), true, s_id, NULL)) AS after,
    date_trunc('day', time) AS day 
  FROM table_a 
  GROUP BY day 
) 

Read each table once (if at all possible) and calculate the desired results. then you will have all your values in one layer of query and can reference these new values. This will be faster (especially on Redshift).

=============================

Expanding based on comment made by poster.

It appears that using DECODE() and referencing derived columns in a single query can produce what you want. I don't have your data so I cannot test this but here is what I'd want to move to:

SELECT 
  COUNT(DISTINCT DECODE(date_trunc('sec',earliest) < date_trunc('sec',time), true, s_id)) AS before,
  ROUND(before * 100.0 / total, 1) AS Percent_1,
  COUNT(DISTINCT DECODE(date_trunc('sec',earliest) = date_trunc('sec',time), true, s_id)) AS equal, 
  COUNT(DISTINCT DECODE(date_trunc('sec',earliest) > date_trunc('sec',time), true, s_id)) AS after,
  COUNT(DISTINCT s_id) AS total
FROM table_a 
GROUP BY date_trunc('day', time);

This should be a complete replacement for the SELECT currently inside your CREATE TEMP TABLE. However, I don't have sample data so this is untested.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1