'data frame parsing column scala

I have some problem with parsing Dataframe

val result = df_app_clickstream.withColumn(
      "attributes",
      explode(expr(raw"transform(attributes, x -> str_to_map(regexp_replace(x, '{\\}',''), ' '))"))
    ).select(
      col("userId"),
      col("attributes").getField("campaign_id").alias("app_campaign_id"),
      col("attributes").getField("channel_id").alias("app_channel_id")
    )
    result.show()

I have input like this :

-------------------------------------------------------------------------------
| userId                               | attributes                            |           
-------------------------------------------------------------------------------
| f6e8252f-b5cc-48a4-b348-29d89ee4fa9e |{'campaign_id':082,'channel_id':'Chnl'}|    
-------------------------------------------------------------------------------

and need to get output like this :

--------------------------------------------------------------------
| userId                               | campaign_id |   channel_id|
--------------------------------------------------------------------
| f6e8252f-b5cc-48a4-b348-29d89ee4fa9e | 082         |   Facebook  |
--------------------------------------------------------------------

but have error



Solution 1:[1]

you can try below solution

import org.apache.spark.sql.functions._
val data = Seq(("f6e8252f-b5cc-48a4-b348-29d89ee4fa9e", """{'campaign_id':082, 'channel_id':'Chnl'}""")).toDF("user_id", "attributes")

val out_df = data.withColumn("splitted_col", split(regexp_replace(col("attributes"),"'|\\}|\\{", ""), ","))
                  .withColumn("campaign_id", split(element_at(col("splitted_col"), 1), ":")(1))
                  .withColumn("channel_id", split(element_at(col("splitted_col"), 2), ":")(1))
             
out_df.show(truncate = false)

+------------------------------------+----------------------------------------+-----------------------------------+-----------+----------+
|user_id                             |attributes                              |splitted_col                       |campaign_id|channel_id|
+------------------------------------+----------------------------------------+-----------------------------------+-----------+----------+
|f6e8252f-b5cc-48a4-b348-29d89ee4fa9e|{'campaign_id':082, 'channel_id':'Chnl'}|[campaign_id:082,  channel_id:Chnl]|082        |Chnl      |
+------------------------------------+----------------------------------------+-----------------------------------+-----------+----------+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Pradeep yadav