'Partitioned tables have impact on join with BigQuery?

With BigQuery if i have two partitioned tables and join these two tables with the primary key.

Does it have any impact on the performance in the join part? if the field was not partitioned it will preform differently?



Solution 1:[1]

Yes, BigQuery optimizes their execution query plan using “join” , but it depends on the structure of your query. You need to use some rules for the query to get the best performance with partitioned tables using “join”. You can see this example:

I created two partitioned tables with the field “event_timestamp”. I will join these two tables.

CREATE TABLE `table1` ( event_timestamp TIMESTAMP, id INT64 )
PARTITION BY DATE(event_timestamp);
 
CREATE TABLE `table2` ( event_timestamp TIMESTAMP, id INT64 )
PARTITION BY DATE(event_timestamp);

I created a view. I joined table1 and table2 in this case.

CREATE VIEW `view1` AS
  SELECT t1.event_timestamp,t1.id
  FROM `table1` t1
  INNER JOIN `table2` t2
  USING (id)
  WHERE TIMESTAMP_TRUNC(t1.event_timestamp, quarter) = TIMESTAMP_TRUNC(t2.event_timestamp, quarter)

If I execute this query.

SELECT * 
FROM view1
WHERE event_timestamp BETWEEN '2020-01-01' AND '2020-04-01'

In this case, the view only is pruning the partitions in the first table.

Yo can see this example:

SELECT t1.event_timestamp,t1.id
    FROM table1 t1
    INNER JOIN table2 t2
    USING (id)
    WHERE t1.event_timestamp BETWEEN '2022-01-01' AND '2022-04-01'
    AND t2.event_timestamp BETWEEN '2022-01-01' AND '2022-04-01'

The query pruned the partitions of both tables.

If you want to join two partitioned tables with the column you are using as partition in both tables. Will scan all the table. The dynamic pruning does not work through subqueries or joins. There is a public feature request, you can follow in this link. There is no ETA to resolve this.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1