'Bigquery Propagate partition filter on alias of _PARTITIONTIME

I'm having an issue with partitioned tables and the propagation of the partition field.

with base_table as (
--Consumes about 3 MB
SELECT id, count(*) FROM `project.dataset.base_table`

,table1 as (
--Where partition_date=2022-04-26" consumes 600.5 MB
SELECT partition_date, id, count(*) as Amount1 FROM `project.dataset.view1` Group by partition_date,id
)

,table2 as (
--Where partition_date=2022-04-26" consumes 33.5 MB
--Without partition filter, consumes 15 GB
SELECT partition_date, id, count(*) as amount2 FROM `project.dataset.view2` Group by partition_date,id
)


Select 
bt.id,t1.amount1, t2.amount2
FROM base_table bt
LEFT JOIN table1 t1 ON
    bt.id=t1.id
LEFT JOIN table2 t2 ON
    bt.id=t2.id AND
    t1.partition_date = t2.partition_date
WHERE bt.id IS NOT NULL and t1.partition_date="2022-04-26"

This query consumes about 15,1 GB. But if I execute the query adding the following filter:

and t2.partition_date="2022-04-26"

Then the query consumes about 636 MB.

So what I can get from this, is that the partition filter is not being propagated throught the join.

Note: The view are something like this:

SELECT   *, DATE(_PARTITIONTIME) AS PARTITION_DATE FROM `project.dataset.table1` WHERE DATE(_PARTITIONTIME) >= "2021-01-01"

For security reasons, I have no access directly to the tables.

Is there anything I can do To avoid writing the partition filter multiple times? (The original query has 15+ partitioned tables)

Solution 1:^[1]

I think there's not you can do to avoid write the partition filter multiple times. Without specifying the partition prior the join all the partitions need to be scanned to get the rows which match the join condition.

You can use partition filter as an attribute to make easier change the date filter:

with partition_filter as (
  select '2022-04-26' as start_date
)
,base_table as (
--Consumes about 3 MB
SELECT id, count(*) FROM `project.dataset.base_table`

,table1 as (
--Where partition_date=2022-04-26" consumes 600.5 MB
SELECT partition_date, id, count(*) as Amount1 
FROM `project.dataset.view1`, partition_filter
WHERE partition_date = partition_filter.start_date
Group by partition_date,id
)

,table2 as (
--Where partition_date=2022-04-26" consumes 33.5 MB
--Without partition filter, consumes 15 GB
SELECT partition_date, id, count(*) as amount2 
FROM `project.dataset.view2`, partition_filter
WHERE partition_date = partition_filter.start_date
Group by partition_date,id
)


Select 
bt.id,t1.amount1, t2.amount2
FROM base_table bt
LEFT JOIN table1 t1 ON
    bt.id=t1.id
LEFT JOIN table2 t2 ON
    bt.id=t2.id AND
    t1.partition_date = t2.partition_date
WHERE bt.id IS NOT NULL

Solution 2:^[2]

I think this could work

for f in *read1.fastq.gz; do echo $f;zcat $f|wc -l ; done > read_count.txt

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	DamiÃ£o Martins
Solution 2	Lino_ares

'Bigquery Propagate partition filter on alias of _PARTITIONTIME

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]