'Combine two dataframes with structs in pyspark
I have two dataframes (A and B) with the following schema
root
|-- AUTHOR_ID: integer (nullable = false)
|-- NAME: string (nullable = true)
|-- Books: array (nullable = false)
| |-- element: struct (containsNull = false)
| | |-- BOOK_ID: integer (nullable = false)
| | |-- Chapters: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- NAME: string (nullable = true)
| | | | |-- NUMBER_PAGES: integer (nullable = true)
What is the best and the cleaning way to combine the two dataframes and each item will be as a struct field in a new colmun, to get that as a result :
+---------+-------- +------------
|AUTHOR_ID| A + B |
+---------+-------- + -----------|
| 1 | {} | {} | keep the nested structs in the new column
| | | |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
