'Spark: Select specific Index of value of type Any

So I have a DataFrame in which one of the columns is of type WrappedArray(JSON). In each of these, the JSON has a format [String, String] I have sucessfully accessed the inside of the array and now I have a column of type Any in which each value has a [String, String]. Point is: I just want to take the first value of these two strings, but if I try something like column(0), it raises an error since Any has no index. How can I access this value?

My code for now is:

val schema = StructType(Seq(
    StructField("productId", StringType, true),
    StructField("name", StringType, true)
))

df.withColumn("column", from_json($"column"(0), schema) )

And the schema of my df:

root
 |-- customerId: string (nullable = true)
 |-- column: struct (nullable = true)
 |    |-- productId: string (nullable = true)
 |    |-- name: string (nullable = true)
 |-- date: date (nullable = true)


Solution 1:[1]

I managed to solve the problem myself. The answer was quite obvious: Instead of creating a column of type struct which contains two values, I created a MapType which had the same values.

My final code:

df.withColumn("column", from_json($"column"(0), MapType(StringType, StringType)) )

And then, for accessing the keys and values of the new column:

.select("column.productId", "column.name")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 leo_val