'Get count of each value from each column in the table

As the title says, I have a table from which I need to count each row from each column. I have an SQL query, which selects needed rows based on WHERE and BETWEEN keywords:

SELECT * FROM (
  SELECT pg, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY pg
  UNION ALL
  SELECT dv, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY dv
  UNION ALL
  SELECT br, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY br
  UNION ALL
  SELECT os, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY os
  UNION ALL
  SELECT lc, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY lc 
  UNION ALL
  SELECT ref, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY ref 
  UNION ALL
  SELECT so, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY so
  UNION ALL
  SELECT me, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY me 
  UNION ALL
  SELECT ca, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY ca
  UNION ALL
  SELECT cc, count(*) FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' GROUP BY cc
);

The problem is that for each column it seems to select partition of table (based on WHERE and BETWEEN) each time, which looks a bit unoptimised.

I've tried to optimise it by creating the next query:

SELECT * FROM (
  SELECT pg,dv,br,os,lc,ref,so,me,ca,cc FROM analytics WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' INTO table
  UNION ALL
  SELECT count(*) FROM table GROUP BY pg
  UNION ALL
  SELECT count(*) FROM table GROUP BY dv
  UNION ALL
  SELECT count(*) FROM table GROUP BY br
  UNION ALL
  SELECT count(*) FROM table GROUP BY os
  UNION ALL
  SELECT count(*) FROM table GROUP BY lc 
  UNION ALL
  SELECT count(*) FROM table GROUP BY ref 
  UNION ALL
  SELECT count(*) FROM table GROUP BY so
  UNION ALL
  SELECT count(*) FROM table GROUP BY me 
  UNION ALL
  SELECT count(*) FROM table GROUP BY ca
  UNION ALL
  SELECT count(*) FROM table GROUP BY cc
);

but it returns an error: Code: 60. DB::Exception: Received from localhost:9000. DB::Exception: Table analytics.table doesn't exist. (UNKNOWN_TABLE).

The current database I use is Clickhouse.

Is there a way how I can optimise this query? Why does the DBMS returns this table doesn't exist error?



Solution 1:[1]

You're using count() with group by and this looks like you want to count unique values from each of the columns. Instead can you uniq (or uniqExact for the strict precision)

SELECT 
    uniq(pg),
    uniq(dv),
    uniq(br),
    uniq(os),
    uniq(lc),
    uniq(ref),
    uniq(so),
    uniq(me),
    uniq(ca),
    uniq(cc),
FROM analytics
WHERE pid='STEzHcB1rALV' AND created BETWEEN '2022-01-01' AND '2022-01-28' 

This will give results in one row.

Solution 2:[2]

Consider using sumMap aggregated function:

SELECT
    pid,
    v1_result,
    v2_result
FROM 
(
    SELECT
        pid,
        sumMap([v1], [1]) AS _value1_to_count,
        arrayMap((value, count) -> (value, count), _value1_to_count.1, _value1_to_count.2) AS v1_result,
        sumMap([v2], [1]) AS _value2_to_count,
        arrayMap((value, count) -> (value, count), _value2_to_count.1, _value2_to_count.2) AS v2_result
    FROM 
    (
        SELECT
            number % 5 AS pid,
            number % 2 AS v1,
            number % 3 AS v2
        FROM numbers(256)
    )
    WHERE pid = 4
    GROUP BY pid
)

/*

(0,26) means that value '0' appeared 26 times in the column 'v1_result'.

??pid???v1_result?????????v2_result???????????????
?   4 ? [(0,26),(1,25)] ? [(0,17),(1,17),(2,17)] ?
??????????????????????????????????????????????????
*/

Consider using uniqExact aggregated function:

SELECT
    pid,
    uniqExact(v1),
    uniqExact(v2)
FROM
(
    SELECT
        number % 5 AS pid,
        number % 2 AS v1,
        number % 3 AS v2
    FROM numbers(256)
)
WHERE pid = 4
GROUP BY pid

/*
??pid???uniqExact(v1)???uniqExact(v2)??
?   4 ?             2 ?             3 ?
???????????????????????????????????????
*/

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Andrei Koch
Solution 2