'Update BigQuery view with indirect schema changes
When updating a view, indirect schema changes do not seem to be picked up.
Steps to reproduce
- Create
view1with a fieldfield1(e.g.SELECT 1 AS field1) - Create
view2selecting all fields fromview1 - Update
view1to also includefield2(e.g.SELECT 1 AS field1, 2 AS field2) - Update
view2with the same query as before (due to documented limitation)
Desired outcome
- Schema for
view1andview2includingfield1andfield2 - View updates should be atomic
Actual outcome
- Schema for
view1correctly updated (includingfield1andfield2) - Schema for
view2is only includingfield1 - Selecting from
view2does actually returnfield1andfield2
I could delete view2 and re-create it again but that wouldn't be atomic and there would be a time where the view is not available which is undesired.
I also tried to update the schema property of view2 but that is refused with Cannot add fields (field: field2):
google.api_core.exceptions.BadRequest: 400 PATCH https://www.googleapis.com/bigquery/v2/projects/<project-id>/datasets/dataset1/tables/view2: Provided Schema does not match Table <project-id>:dataset1.view2. Cannot add fields (field: field2)
Question
Is there any way to update the view atomically while also updating the schema which was changed indirectly (a table/view the view selects from).
Note: of course my view2 would add additional fields and I could currently determine its schema by creating a new temporary view.
Note: the schema is important because tools such as the Data Studio's BigQuery connector are inspecting the schema.
Code to reproduce the steps
# Python 3.6+
import google.api_core.exceptions
from google.cloud import bigquery
def delete_table_if_exists(client: bigquery.Client, table: bigquery.Table):
try:
client.delete_table(table)
except google.api_core.exceptions.NotFound:
pass
def full_table_id(table: bigquery.Table) -> str:
# Note: the documentation says it should be separated by a dot but uses a colon
return table.full_table_id.replace(':', '.')
def view_test():
client = bigquery.Client()
dataset_ref = client.dataset('dataset1')
try:
client.create_dataset(dataset_ref)
except google.api_core.exceptions.Conflict:
pass
view1 = bigquery.Table(dataset_ref.table('view1'))
view2 = bigquery.Table(dataset_ref.table('view2'))
delete_table_if_exists(client, view1)
delete_table_if_exists(client, view2)
view1.view_query = 'SELECT 1 AS field1'
view1 = client.create_table(view1)
view2.view_query = f'SELECT * FROM `{full_table_id(view1)}`'
client.create_table(view2)
view1.view_query = 'SELECT 1 AS field1, 2 AS field2'
client.update_table(view1, ['view_query'])
client.update_table(view2, ['view_query'])
print('view2 schema:', client.get_table(view2).schema)
# trying to update the schema fails with 'Cannot add fields (field: field2)'
view2.schema = client.get_table(view1).schema
client.update_table(view2, ['schema'])
if __name__ == '__main__':
view_test()
Bash example doing the same
#!/bin/bash
set -e
project_id=$(gcloud config list --format 'value(core.project)' 2>/dev/null)
bq mk -f dataset1
bq rm -f dataset1.view1
bq rm -f dataset1.view2
bq mk --use_legacy_sql=false --view 'SELECT 1 AS field1' dataset1.view1
bq mk --use_legacy_sql=false --view 'SELECT * FROM `'$project_id'.dataset1.view1`' dataset1.view2
bq update --use_legacy_sql=false --view 'SELECT 1 AS field1, 2 AS field2' dataset1.view1
bq update --use_legacy_sql=false --view 'SELECT * FROM `'$project_id'.dataset1.view1`' dataset1.view2
bq show dataset1.view2
Update: Code with accepted answer
Python code
def get_create_or_replace_view_query(view: bigquery.Table) -> str:
return f'CREATE OR REPLACE VIEW {view.dataset_id}.{view.table_id} AS {view.view_query}'
def view_test():
# ...
query_job = client.query(get_create_or_replace_view_query(view2))
query_job.result()
print('view2 schema:', client.get_table(view2).schema)
Bash magic
bq query --use_legacy_sql=false 'CREATE OR REPLACE VIEW dataset1.view2 AS SELECT * FROM `'$project_id'.dataset1.view1`'
Solution 1:[1]
You should use a CREATE OR REPLACE VIEW statement; see the related documentation. BigQuery provides ACID semantics for all queries that perform table modifications, and CREATE OR REPLACE VIEW is no exception, so this replaces the definition and schema of the view atomically.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Elliott Brossard |
