'BigQueryOperator in spark - can't write array struct to bigquery table

In BigQuery, I have a field that is of type RECORD and in REPEATED mode, a column called actions. In Spark, I have a schema defined as

val action: StructType = (new StructType)
    .add("id", StringType)
    .add("name", StringType)
    .add("last", StringType)

val actionsList = new ArrayType(action, true)

val finalStruct: StructType = (new StructType)
    .add("record", StringType)
    .add("d", StringType)
    .add("actions", actionsList)

This is how my schema is defined, then I simply read it in and write it to bigquery.

val df = spark.read.schema(finalStruct).json(rdd)
df.createOrReplaceTempView("myData")
val finalDf = sqlContext.sql("SELECT record as my_rec, d as inc_date, actions from myData")
finalDf.write.mode("append").format("bigquery")...save()

However, when I attempt to write the dataframe, I get the error -

BigQuery error was provided Schema does not match Table <table_name_here>.  
Cannot add fields (field: actions.list)

What's the proper way to define this schema? My data coming in is in json format like

{
    "recordName":"name_here", 
    "date": "2020-01-01", 
    "actions": [
        {
            "id":"1", 
            "name":"aaa", 
            "last":"bbb"
        },
        {
            "id":"2", 
            "name":"qqq", 
            "last":"www"
        }
    ]

Solution 1:^[1]

It's a known issue when connector is used on the default settings with Paruqet format used as an intermediate later (see similar bug report).

Changing the format to ORC solves the issue:

spark.conf.set("spark.datasource.bigquery.intermediateFormat", "orc")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Mariusz

'BigQueryOperator in spark - can't write array struct to bigquery table

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]