'Pyspark - filter dataframe based on nested structs

Let's suppose that we have the following dataframe schema

root
 |-- AUTHOR_ID: integer (nullable = false)
 |-- NAME: string (nullable = true)
 |-- Books: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- BOOK_ID: integer (nullable = false)
 |    |    |-- Chapters: array (nullable = true) 
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- NAME: string (nullable = true)
 |    |    |    |    |-- NUMBER_PAGES: integer (nullable = true)
  • How to find the authors that have books with NUMBER_PAGES < 100

Thanks



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source