'Spark Read BigQuery External Table
Trying to Read a external table from BigQuery but gettint a error
SCALA_VERSION="2.12"
SPARK_VERSION="3.1.2"
com.google.cloud.bigdataoss:gcs-connector:hadoop3-2.2.0,
com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.24.2'
table = 'data-lake.dataset.member'
df = spark.read.format('bigquery').load(table)
df.printSchema()
Result:
root
|-- createdAtmetadata: date (nullable = true)
|-- eventName: string (nullable = true)
|-- producerName: string (nullable = true)
So when im print
df.createOrReplaceTempView("member")
spark.sql("select * from member limit 100").show()
i got this message error:
INVALID_ARGUMENT: request failed: Only external tables with connections can be read with the Storage API.
Solution 1:[1]
As external tables are not supported in queries by spark, i tried the other way and got!
def read_query_bigquery(project, query):
df = spark.read.format('bigquery') \
.option("parentProject", "{project}".format(project=project))\
.option('query', query)\
.option('viewsEnabled', 'true')\
.load()
return df
project = 'data-lake'
query = 'select * from data-lake.dataset.member'
spark.conf.set("materializationDataset",'dataset')
df = read_query_bigquery(project, query)
df.show()
Solution 2:[2]
The bigquery connector uses the BigQuery Storage API to read the data. At the moment this API does not support external tables, this the connector doesn't support them as well.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Pedro Rodrigues |
| Solution 2 | David Rabinowitz |
