'Get count of each value from each column in the table
As the title says, I have a table from which I need to count each row from each column.
I have an SQL query, which selects needed rows based on WHERE and BETWEEN keywords:
SELECT * FROM (
SELECT pg, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY pg
UNION ALL
SELECT dv, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY dv
UNION ALL
SELECT br, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY br
UNION ALL
SELECT os, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY os
UNION ALL
SELECT lc, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY lc
UNION ALL
SELECT ref, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY ref
UNION ALL
SELECT so, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY so
UNION ALL
SELECT me, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY me
UNION ALL
SELECT ca, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY ca
UNION ALL
SELECT cc, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY cc
);
The problem is that for each column it seems to select partition of table (based on WHERE and BETWEEN) each time, which looks a bit unoptimised.
I've tried to optimise it by creating the next query:
SELECT * FROM (
SELECT pg,dv,br,os,lc,ref,so,me,ca,cc FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' INTO table
UNION ALL
SELECT count(*) FROM table GROUP BY pg
UNION ALL
SELECT count(*) FROM table GROUP BY dv
UNION ALL
SELECT count(*) FROM table GROUP BY br
UNION ALL
SELECT count(*) FROM table GROUP BY os
UNION ALL
SELECT count(*) FROM table GROUP BY lc
UNION ALL
SELECT count(*) FROM table GROUP BY ref
UNION ALL
SELECT count(*) FROM table GROUP BY so
UNION ALL
SELECT count(*) FROM table GROUP BY me
UNION ALL
SELECT count(*) FROM table GROUP BY ca
UNION ALL
SELECT count(*) FROM table GROUP BY cc
);
but it returns an error: Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table analytics.table doesn't exist. (UNKNOWN_TABLE).
The current database I use is Clickhouse.
Is there a way how I can optimise this query?
Why does the DBMS returns this table doesn't exist error?
Solution 1:[1]
You're using count() with group by and this looks like you want to count unique values from each of the columns.
Instead can you uniq (or uniqExact for the strict precision)
SELECT
uniq(pg),
uniq(dv),
uniq(br),
uniq(os),
uniq(lc),
uniq(ref),
uniq(so),
uniq(me),
uniq(ca),
uniq(cc),
FROM analytics
WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28'
This will give results in one row.
Solution 2:[2]
Consider using sumMap aggregated function:
SELECT
pid,
v1_result,
v2_result
FROM
(
SELECT
pid,
sumMap([v1], [1]) AS _value1_to_count,
arrayMap((value, count) -> (value, count), _value1_to_count.1, _value1_to_count.2) AS v1_result,
sumMap([v2], [1]) AS _value2_to_count,
arrayMap((value, count) -> (value, count), _value2_to_count.1, _value2_to_count.2) AS v2_result
FROM
(
SELECT
number % 5 AS pid,
number % 2 AS v1,
number % 3 AS v2
FROM numbers(256)
)
WHERE pid = 4
GROUP BY pid
)
/*
(0,26) means that value '0' appeared 26 times in the column 'v1_result'.
??pid???v1_result?????????v2_result???????????????
? 4 ? [(0,26),(1,25)] ? [(0,17),(1,17),(2,17)] ?
??????????????????????????????????????????????????
*/
Consider using uniqExact aggregated function:
SELECT
pid,
uniqExact(v1),
uniqExact(v2)
FROM
(
SELECT
number % 5 AS pid,
number % 2 AS v1,
number % 3 AS v2
FROM numbers(256)
)
WHERE pid = 4
GROUP BY pid
/*
??pid???uniqExact(v1)???uniqExact(v2)??
? 4 ? 2 ? 3 ?
???????????????????????????????????????
*/
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Andrei Koch |
| Solution 2 |
