'SQL UNION ALL but with lots of columns on BigQuery?
Above image is a screenshot of my table just as a quick initial reference.
The focal point are the multiple mech columns (mech1, mech2, mech3, and mech4).
Board games in this tables have multiple attributes called mechanisms so I've separated them into 4 different columns.
So I've learned how to combine columns vertically via UNION ALL so that I can query the count of all unique game mechanisms in my table.
However, it got me wondering if there's a shorter and more efficient way to achieve what I've done:
WITH mechanism_info AS
(
WITH
mechanism_col_combined AS
(
SELECT mech1 AS all_mech_columns_combined
FROM `ckda-portfolio-2022.bg_collection.base`
UNION ALL
## There's no IS NOT NULL condition defined for column 'mech1' since there's at least one mechanism noted for a game.
SELECT mech2
FROM `ckda-portfolio-2022.bg_collection.base`
WHERE mech2 IS NOT NULL
UNION ALL
SELECT mech3
FROM `ckda-portfolio-2022.bg_collection.base`
WHERE mech3 IS NOT NULL
UNION ALL
SELECT mech4
FROM `ckda-portfolio-2022.bg_collection.base`
WHERE mech4 IS NOT NULL
)
## Temporary table with all mechanism column in the collection combined.
SELECT DISTINCT(all_mech_columns_combined) AS unique_mechanisms, COUNT(*) AS count
FROM mechanism_col_combined
GROUP BY all_mech_columns_combined
ORDER BY all_mech_columns_combined
)
SELECT *
FROM mechanism_info
By querying this temp. table, SQL returns the information that I've anticipated as below:
unique_mechanisms | count
Acting | 1
Action Points | 3
Action Queue | 1
Action Retrieval | 1
Area Movement | 1
Auction/Bidding | 5
Bag Building | 1
Betting & Bluffing| 2
Bingo | 1
Bluffing | 7
Now, I want to shorten my code and I know there has to be a way to shorten the repetitive process of combining columns with UNION ALL.
And if there's any other tips or methods on how to shorten my query, please let me know!
Thank you.
Solution 1:[1]
You can convert the multiple columns [mech1, mech2, ...] into a column of array mech_arr and then using UNNEST to convert the column to have scalar value in each row.
For example:
WITH table1 AS (
SELECT 'AA' AS mech1, 'BB' AS mech2, 'CC' AS mech3,
UNION ALL SELECT 'AA' AS mech1, 'CC' AS mech2, 'EE' AS mech3
),
table2 AS (SELECT [mech1, mech2, mech3] AS mech_arr FROM table1)
SELECT mech, COUNT(*) AS mech_counts
FROM table2, UNNEST(mech_arr) AS mech
GROUP BY mech
Output
mech mech_counts
AA 2
BB 1
CC 2
EE 1
Solution 2:[2]
You could send join into the table, but the performance would not improve and the query would be just as long.
You can simplify as follows:
SELECT
mech_column,
count(*) "number"
FROM (
SELECT mech1 AS mech_column
FROM `ckda-portfolio-2022.bg_collection.base`
UNION ALL
SELECT mech2
FROM `ckda-portfolio-2022.bg_collection.base`
UNION ALL
SELECT mech3
FROM `ckda-portfolio-2022.bg_collection.base`
UNION ALL
SELECT mech4
FROM `ckda-portfolio-2022.bg_collection.base`
) m
WHERE mech_column IS NOT NULL
GROUP BY mech_column
ORDER BY mech_column;
Solution 3:[3]
Didn't find a smoother way to query but I did find a way to remove the process of adding WHERE column IS NOT NULL for each and every columns that was used to vertically aggregate them into a single column:
mechanism_info AS
(
WITH
mechanism_col_combined AS
(
SELECT mech1 AS mech_columns
FROM `ckda-portfolio-2022.bg_collection.base`
UNION ALL
SELECT mech2
FROM `ckda-portfolio-2022.bg_collection.base`
UNION ALL
SELECT mech3
FROM `ckda-portfolio-2022.bg_collection.base`
UNION ALL
SELECT mech4
FROM `ckda-portfolio-2022.bg_collection.base`
## Removed all WHERE clause from the above columns
and added it below instead.
)
## Temporary table with all mechanism columns in the collection combined.
SELECT DISTINCT(mech_columns) AS mechanisms, COUNT(*) AS count
FROM mechanism_col_combined
WHERE mech_columns IS NOT NULL ## <--- Added here!
GROUP BY mech_columns
ORDER BY mech_columns
)
SELECT * FROM mechanism_info
Since mechanism_info is a nested temp. table, I can just add WHERE mech_columns IS NOT NULL clause and condition to the initial temp. table's setting.
I'm still looking to reduce this query down to something more efficient. It's unfortunate that UNION ALL can't select multiple columns with a single call :(
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | dikesh |
| Solution 2 | |
| Solution 3 | LordPuggo |

