'In SparkSQL how could I select a subset of columns from a nested struct and keep it as a nested struct in the result using SQL statement?

I can do the following statement in SparkSQL:

result_df = spark.sql("""select
    one_field,
    field_with_struct
  from purchases""")

And resulting data frame will have the field with full struct in field_with_struct.

one_field	field_with_struct
123	{name1,val1,val2,f2,f4}
555	{name2,val3,val4,f6,f7}

I want to select only few fields from field_with_struct, but keep them still in struct in the resulting data frame. If something could be possible (this is not real code):

result_df = spark.sql("""select
    one_field,
    struct(
      field_with_struct.name,
      field_with_struct.value2
    ) as my_subset
  from purchases""")

To get this:

one_field	my_subset
123	{name1,val2}
555	{name2,val4}

Is there any way of doing this with SQL? (not with fluent API)

pyspark apache-spark-sql

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'In SparkSQL how could I select a subset of columns from a nested struct and keep it as a nested struct in the result using SQL statement?

Sources

Related Questions