'Spark Read BigQuery External Table

Trying to Read a external table from BigQuery but gettint a error

    SCALA_VERSION="2.12"
    SPARK_VERSION="3.1.2"
    com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.0,
    com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.24.2'

    table = 'data-lake.dataset.member'
    df = spark.read.format('bigquery').load(table)
    df.printSchema()

Result:

root
  |-- createdAtmetadata: date (nullable = true)
  |-- eventName: string (nullable = true)
  |-- producerName: string (nullable = true)

So when im print

df.createOrReplaceTempView("member")
spark.sql("select * from member limit 100").show()

i got this message error:

INVALID_ARGUMENT: request failed: Only external tables with connections can be read with the Storage API.



Solution 1:[1]

As external tables are not supported in queries by spark, i tried the other way and got!

def read_query_bigquery(project, query):
df = spark.read.format('bigquery') \
  .option("parentProject", "{project}".format(project=project))\
  .option('query', query)\
  .option('viewsEnabled', 'true')\
  .load()

return df

project = 'data-lake'
query = 'select * from data-lake.dataset.member'
spark.conf.set("materializationDataset",'dataset')
df = read_query_bigquery(project, query)
df.show()

Solution 2:[2]

The bigquery connector uses the BigQuery Storage API to read the data. At the moment this API does not support external tables, this the connector doesn't support them as well.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Pedro Rodrigues
Solution 2 David Rabinowitz