'Retrieving BigQuery validation errors when loading JSONL data via the Python API

How can I retrieve more information relating to the validation errors when loading a JSONL file into BigQuery? (The question is not about solving the issue)

Example code:

from google.cloud.bigquery import (
    LoadJobConfig,
    QueryJobConfig,
    Client,
    SourceFormat,
    WriteDisposition
)

# variables depending on the environment
filename = '...'
gcp_project_id = '...'
dataset_name = '...'
table_name = '...'
schema = [ ... ]

# loading data
client = Client(project=project_id)
dataset_ref = client.dataset(dataset_name)
table_ref = dataset_ref.table(table_name)
job_config = LoadJobConfig()
job_config.source_format = SourceFormat.NEWLINE_DELIMITED_JSON
job_config.write_disposition = WriteDisposition.WRITE_APPEND
job_config.schema = schema
LOGGER.info('loading from %s', filename)
with open(filename, "rb") as source_file:
    job = client.load_table_from_file(
        source_file, destination=table_ref, job_config=job_config
    )

    # Waits for table cloud_data_store to complete
    job.result()

Here I am using bigquery-schema-generator to generate a schema (as BigQuery otherwise only looks at the first 100 rows).

Running that might error with the following error message (google.api_core.exceptions.BadRequest):

400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.

Looking at the errors property basically doesn't provide any new information:

[{'reason': 'invalid',
  'message': 'Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.'}]

I also looked at __dict__ of the exception but that hasn't revealed any further information.

Trying to load the table using the bq command line (in this case without explicit schema) results in a much more helpful message:

BigQuery error in load operation: Error processing job '...': Provided Schema does not match Table <table name>. Field <field name> has changed type from TIMESTAMP to DATE

My question now is how would I be able to retrieve such helpful message from the Python API?

Solution based on accepted answer

Here is a copy and past workaround that one could add in order to show more information by default. (There may be downsides to it)

import google.cloud.exceptions
import google.cloud.bigquery.job


def get_improved_bad_request_exception(
    job: google.cloud.bigquery.job.LoadJob
) -> google.cloud.exceptions.BadRequest:
    errors = job.errors
    result = google.cloud.exceptions.BadRequest(
        '; '.join([error['message'] for error in errors]),
        errors=errors
    )
    result._job = job
    return result


def wait_for_load_job(
    job: google.cloud.bigquery.job.LoadJob
):
    try:
        job.result()
    except google.cloud.exceptions.BadRequest as exc:
        raise get_improved_bad_request_exception(job) from exc

Then calling wait_for_load_job(job) instead of job.result() directly, will result in a more useful exception (the error message and errors property).



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source