'In SparkSQL how could I select a subset of columns from a nested struct and keep it as a nested struct in the result using SQL statement?
I can do the following statement in SparkSQL:
result_df = spark.sql("""select
one_field,
field_with_struct
from purchases""")
And resulting data frame will have the field with full struct in field_with_struct.
| one_field | field_with_struct |
|---|---|
| 123 | {name1,val1,val2,f2,f4} |
| 555 | {name2,val3,val4,f6,f7} |
I want to select only few fields from field_with_struct, but keep them still in struct in the resulting data frame. If something could be possible (this is not real code):
result_df = spark.sql("""select
one_field,
struct(
field_with_struct.name,
field_with_struct.value2
) as my_subset
from purchases""")
To get this:
| one_field | my_subset |
|---|---|
| 123 | {name1,val2} |
| 555 | {name2,val4} |
Is there any way of doing this with SQL? (not with fluent API)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
