'Count distinct by boolean value
Is there a better (prettier, more idiomatic, or even performant) way to do the following?
Objective: calculate distinct values for a column by another boolean column.
Sample data:
id | metadata_streaming_date | cols_exist |
--- | ----------------------- | -----------|
1 | 2022-02-20 | true |
1 | 2022-02-20 | true |
2 | 2022-02-20 | true |
2 | 2022-02-20 | true |
3 | 2022-02-20 | false |
1 | 2022-02-19 | true |
2 | 2022-02-19 | false |
3 | 2022-02-19 | false |
4 | 2022-02-19 | false |
4 | 2022-02-19 | false |
Expected result is to count distinct id grouped by metadata_streaming_date split by wanted (where cols_exist = false) and overall (all rows for this id per date).
Result table expected:
| metadata_streaming_date | wanted | overall |
| ----------------------- | -------| --------|
| 2022-02-20 | 1 | 3 |
| 2022-02-19 | 3 | 4 |
I can achieve it through two sub-queries and inner-joining then by metadata_streaming_date:
select
t1.metadata_streaming_date,
overall,
wanted,
wanted / overall as perc
from
(
select
metadata_streaming_date,
count(distinct id) as overall
from
non_needed_fields_view
where
metadata_streaming_date >= '2022-02-19'
group by
metadata_streaming_date
) as t1
inner join (
select
metadata_streaming_date,
count(distinct id) as wanted
from
non_needed_fields_view
where
cols_exist is false
and metadata_streaming_date >= '2022-02-19'
group by
metadata_streaming_date
) as t2 on t1.metadata_streaming_date = t2.metadata_streaming_date
Solution 1:[1]
You can try to use the aggregate condition function with DISTINCT, let your logic in CASE WHEN expression.
SELECT metadata_streaming_date,
COUNT(DISTINCT CASE WHEN cols_exist = false THEN id END) wanted ,
COUNT(DISTINCT id) overall
FROM non_needed_fields_view
WHERE metadata_streaming_date >= '2022-02-19'
GROUP BY metadata_streaming_date
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | D-Shih |
