'How to solve error "scala.collection.mutable.WrappedArray$ofRef cannot be cast to [Ljava.lang.String" for unwrapping map in UDF
I get the error above when I apply my UDF, which is defined as followed:
import org.apache.spark.sql.functions.typedLit
import org.apache.spark.sql.functions.udf
def method_name(map:Map[String, Array[String]]):String = {
var col_a:Array[String] = map("a")
var col_b:Array[String] = map("b")
...
return "Test_string"
}
//excel is a dataframe with col "a" and col "b"
val col_a = excel.select("a").rdd.map(r => r(0).asInstanceOf[String]).collect()
val col_b = excel.select("b").rdd.map(r => r(0).asInstanceOf[String]).collect()
var new_map: Map[String, Array[String]] = List("a" -> col_a).toMap
new_map += ("b" -> col_b)
val method_name_udf = udf(method_name _)
resultTable = resultTable.withColumn("new_map", typedLit(new_map))
resultTable = resultTable.withColumn("new_col", method_name_udf(col("new_map")))
- I use "rdd.map(r => r(0).asInstanceOf[String]).collect()" to get the column of the dataframe as an Array of Strings
- I define new_map as a map
- I apply withcolumn on my resulttable with the method typedLit, which just appends new_map to all rows in a new column "new_map"
- Lastly I apply the UDF in a new column which refers to the new map.
- In the UDF I just want to get the Array of Strings by using map("a"). This is where the error occurs
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
