'Spark: Select specific Index of value of type Any
So I have a DataFrame in which one of the columns is of type WrappedArray(JSON). In each of these, the JSON has a format [String, String] I have sucessfully accessed the inside of the array and now I have a column of type Any in which each value has a [String, String]. Point is: I just want to take the first value of these two strings, but if I try something like column(0), it raises an error since Any has no index. How can I access this value?
My code for now is:
val schema = StructType(Seq(
StructField("productId", StringType, true),
StructField("name", StringType, true)
))
df.withColumn("column", from_json($"column"(0), schema) )
And the schema of my df:
root
|-- customerId: string (nullable = true)
|-- column: struct (nullable = true)
| |-- productId: string (nullable = true)
| |-- name: string (nullable = true)
|-- date: date (nullable = true)
Solution 1:[1]
I managed to solve the problem myself. The answer was quite obvious: Instead of creating a column of type struct which contains two values, I created a MapType which had the same values.
My final code:
df.withColumn("column", from_json($"column"(0), MapType(StringType, StringType)) )
And then, for accessing the keys and values of the new column:
.select("column.productId", "column.name")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | leo_val |
