'Rank columns by their count of values
I have a table with a bunch of boolean columns. I'd like to rank these columns by the count of true values each one has.
I found a way to count the number of true values in a column using:
SELECT count(CASE WHEN col1 THEN 1 ELSE null END) as col1,
count(CASE WHEN col2 THEN 1 ELSE null END) as col2
....
FROM my_table;
but this approach has two problems:
- I have to manually type the names of the columns
- I have to then transpose the result and order by value
Is there a way to do the whole operation one query?
Solution 1:[1]
This is not actually a crosstab job (or "pivot" in other RDBMS), but the reverse operation, "unpivot" if you will. One elegant technique is a VALUES expression in a LATERAL join.
The basic query can look like this, which takes care of:
- I have to then transpose the result and order by value
SELECT c.col, c.ct
FROM (
SELECT count(col1 OR NULL) AS col1
, count(col2 OR NULL) AS col2
-- etc.
FROM tbl
) t
, LATERAL (
VALUES
('col1', col1)
, ('col2', col2)
-- etc.
) c(col, ct)
ORDER BY 2;
That was the simple part. Your other request is harder:
- I have to manually type the names of the columns
This function takes your table name and retrieves meta data from the system catalog pg_attribute. It's a dynamic implementation of the above query, safe against SQL injection:
CREATE OR REPLACE FUNCTION f_true_ct(_tbl regclass)
RETURNS TABLE (col text, ct bigint)
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('
SELECT c.col, c.ct
FROM (SELECT %s FROM tbl) t
, LATERAL (VALUES %s) c(col, ct)
ORDER BY 2 DESC'
, string_agg (format('count(%1$I OR NULL) AS %1$I', attname), ', ')
, string_agg (format('(%1$L, %1$I)', attname), ', ')
)
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible, legal table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
AND atttypid = 'bool'::regtype -- only character types
);
END
$func$;
Call:
SELECT * FROM f_true_ct('tbl'); -- table name optionally schema-qualified
Result:
col | ct
------+---
col1 | 3
col3 | 2
col2 | 1
Works for any table to rank all boolean columns by their count of true values.
To understand the function parameter, read this:
Related answers with more explanation:
Solution 2:[2]
If I understand correctly, you can do this with a giant union all:
select c.*
from ((select 'col1' as which, sum(case when col1 then 1 else 0 end) as cnt from t
) union all
(select 'col2' as which, sum(case when col2 then 1 else 0 end) as cnt from t
) union all
. . .
) c
order by cnt desc;
Although you still need to type the results, this does sidestep the transpositions.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Gordon Linoff |
