'PostgreSQL subquery COUNT fails when the subquery is joined more than once
I have 2 tables:
Table class:
id serial4 PRIMARY KEY
name varchar(64)
code varchar(64)
Table class_event, where I store events related to classes, such as "started" and "ended".
id serial4
class_id int4 NOT NULL // -> FK to the class table
event_type varchar(1) NOT NULL // -> 's' for started, 'e' for ended.
I need a query the amount of times each class has been started and ended. This works:
select
c.code,
c.name,
count(started.id) "started"
from "class" c
left join (select id, class_id, event_type from "class_event" where event_type = 's') started
on started.klass_id = c.id
group by c.code, c.name
order by started desc;
But when I do exactly the same to get the amount of ended classes it shows incorrect amounts:
select
c.code,
c.name,
count(started.id) "started",
count(ended.id) "ended"
from "class" c
left join (select id, class_id, event_type from "class_event" where event_type = 's') started
on started.klass_id = c.id
left join (select id, class_id, event_type from "class_event" where event_type = 'e') ended
on ended.klass_id = c.id
group by c.code, c.name
order by started desc;
Also, the query takes significantly more time to execute. Is there anything I'm missing?
Solution 1:[1]
Is there anything I'm missing?
Yes, multiple joins multiply rows. It's exactly the same problem as discussed here:
While you query the whole table it's typically cleaner and faster to aggregate first and join later. See:
This also avoids the original problem on principle, even for multiple joins - which we don't need.
SELECT *
FROM class c
LEFT JOIN (
SELECT class_id AS id
, count(*) FILTER (WHERE event_type = 's') AS started
, count(*) FILTER (WHERE event_type = 'e') AS ended
FROM class_event
GROUP BY 1
) e USING (id)
ORDER BY e.started DESC NULLS LAST;
NULLS LAST because it's conceivable that some of the classes have no related rows in table class_event (yet), and the resulting NULL values surely shouldn't sort on top. See:
About the aggregate FILTER clause:
- Aggregate columns with additional (distinct) filters
- For absolute performance, is SUM faster or COUNT?
Aside:
For just a hand full of allowed values, I would consider the data type "char" instead of varchar(1) for event_type. See:
Solution 2:[2]
You can try to use condition aggregate function
select
c.code,
c.name,
count(CASE WHEN event_type = 's' THEN ended.id END) "started",
count(CASE WHEN event_type = 'e' THEN ended.id END) "ended"
from "class" c
left join "class_event" started
on started.class_id = c.id
group by c.code, c.name
order by started desc;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | D-Shih |
