'Combine two dataframes with structs in pyspark

I have two dataframes (A and B) with the following schema

 root
     |-- AUTHOR_ID: integer (nullable = false)
     |-- NAME: string (nullable = true)
     |-- Books: array (nullable = false)
     |    |-- element: struct (containsNull = false)
     |    |    |-- BOOK_ID: integer (nullable = false)
     |    |    |-- Chapters: array (nullable = true) 
     |    |    |    |-- element: struct (containsNull = true)
     |    |    |    |    |-- NAME: string (nullable = true)
     |    |    |    |    |-- NUMBER_PAGES: integer (nullable = true)

What is the best and the cleaning way to combine the two dataframes and each item will be as a struct field in a new colmun, to get that as a result :

+---------+-------- +------------
|AUTHOR_ID| A       + B          |     
+---------+-------- + -----------|
|  1      | {}      |   {}       |   keep the nested structs in the new column
|         |         |            |

apache-spark pyspark

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Combine two dataframes with structs in pyspark

Sources

Related Questions