'How to extract values from key value map?
I have a column of type map, where the key and value changes. I am trying to extract the value and create a new column.
Input:
----------------+
|symbols        |
+---------------+
|[3pea -> 3PEA] |
|[barello -> BA]|
|[]             |
|[]             |
+---------------+
Expected output:
--------+
|symbols|
+-------+
|3PEA   |
|BA     |
|       |
|       |
+-------+
Here is what I tried so far using a udf:
def map_value=udf((inputMap:Map[String,String])=> {inputMap.map(x=>x._2) 
      })
java.lang.UnsupportedOperationException: Schema for type scala.collection.immutable.Iterable[String] is not supported
Solution 1:[1]
import org.apache.spark.sql.functions._
import spark.implicits._
val m = Seq(Array("A -> abc"), Array("B -> 0.11856755943424617"), Array("C -> kqcams"))
val df = m.toDF("map_data")
df.show
// Simulate your data I think.
val df2 = df.withColumn("xxx", split(concat_ws("",$"map_data"), "-> ")).select($"xxx".getItem(1).as("map_val")).drop("xxx")
df2.show(false)
results in:
+--------------------+
|            map_data|
+--------------------+
|          [A -> abc]|
|[B -> 0.118567559...|
|       [C -> kqcams]|
+--------------------+
+-------------------+
|map_val            |
+-------------------+
|abc                |
|0.11856755943424617|
|kqcams             |
+-------------------+
Solution 2:[2]
Since Spark scala v2.3 api, sql v2.3 api, or pyspark v2.4 api you can use the spark sql function map_values
The following is in pyspark, scala would be very similar.
Setup (assuming working SparkSession as spark):
from pyspark.sql import functions as F
df = (
    spark.read.json(sc.parallelize(["""[
        {"key": ["3pea"],    "value": ["3PEA"] },
        {"key": ["barello"], "value": ["BA"]   }
    ]"""]))
    .select(F.map_from_arrays(F.col("key"), F.col("value")).alias("symbols") )
)
df.printSchema()
df.show()
root
 |-- symbols: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)
+---------------+
|        symbols|
+---------------+
| [3pea -> 3PEA]|
|[barello -> BA]|
+---------------+
df.select((F.map_values(F.col("symbols"))[0]).alias("map_vals")).show()
+--------+
|map_vals|
+--------+
|    3PEA|
|      BA|
+--------+
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source | 
|---|---|
| Solution 1 | thebluephantom | 
| Solution 2 | Clay | 
