'How should the BigQueryUpdateTableOperator in Airflow 2 be called to set descriptions of nested fields?

I have a table in BigQuery, that contains subscription data. It has a number of regular columns for things like subscription type, start date, and subscription status. It also has a few structs, for instance an event struct that has fields like ts and name.

In Airflow 1 we used a BigQueryOperator, that we had extended with a lot of custom code to both run the jobs and keep descriptions up to date with annotations in the code.

Now that we are moving to Airflow 2, we want to replace what we did with BigQueryOperator with the more specific operators that are now available. The operator that seems most promising for updating the column descriptions is the BigQueryUpdateTableOperator, which we have had some success with for tables without nested fields.

However, for nested fields we get errors like

google.api_core.exceptions.BadRequest: 400 PATCH https://bigquery.googleapis.com/bigquery/v2/projects/[REDACTED]/tables/subscriptions_1?prettyPrint=false: Field event.ts missing type

when trying to run code like

yield BigQueryUpdateTableSchemaOperator(
    task_id=self.task_id("update_schema", context),
    dag=context.dag,
    dataset_id=self.dataset,
    table_id=self.table,
    schema_fields_updates=[
        {
            "name": name,
            "description": description
        } for name, description in _recursive_items(self.columns)
    ]
)

where recursive items iterates over some dictionary we have set up to manage columns and column descriptions in a object we set up to manage tables and their dependencies. It is not super important for the question how it looks, as I feel confident I can update the function and object representation if I only knew which format the operator expects for the update_schema_fields parameter.

So far, I have tried supplying it with e.g.

{"name": "event.ts", "description":"Some New Description"}
{"name": "event", "fields": [
    {"name": "ts", "description": "Some New Description"}
]}

but neither have worked.

The code works fine for tables without annotations for nested fields and does not complain about the missing type in those cases. The documentation only has examples without nested fields here https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/bigquery.html#howto-operator-bigqueryupdatetableschemaoperator

Further investigation has led me to suspect that the changes that I make to the schema_fields_updates are being overwritten by a template_fields version of it. If anyone has any advice on how to make sure those are updated as I update the code, that would also be appreciated.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source