'Extract value from complex array of map type to string

I have a dataframe like below.

No    comp_value
1     [[ -> 10]]
2     [[ -> 35]]

The schema type of column - value is.

comp_value: array (nullable = true)
      element: map(containsNull = true)
              key: string
              value: long (valueContainsNull = true)

I would like to convert the comp_value from complex type to string using PySpark. Is there a way to achieve this?

Expected output:

No    comp_value
1     10
2     35


Solution 1:[1]

from pyspark.sql import functions as F

df = spark.createDataFrame(
    [(1,' [[ -> 10]]'),
     (2, '[[ -> 35]]')],
    ['No', 'v'])


df.show()

replace the corner brackets, remove trailing spaces, split by space to get a list and get the elements you want by slicing the list

new = df.withColumn('comp_value', split(trim(regexp_replace('v','\[|\]','')),'\s')[1])

new.show()

+---+-----------+----------+
| No|          v|comp_value|
+---+-----------+----------+
|  1| [[ -> 10]]|        10|
|  2| [[ -> 35]]|        35|
+---+-----------+----------+

Solution 2:[2]

I will assume your data looks like this:

from pyspark.sql import functions as F

df = spark.createDataFrame(
    [(1, 10),
     (2, 35)],
    ['No', 'v'])
df = df.select('No', F.array(F.create_map(F.lit(''), 'v')).alias('comp_value'))

df.show()
# +---+----------+
# |No |comp_value|
# +---+----------+
# |1  |[{ -> 10}]|
# |2  |[{ -> 35}]|
# +---+----------+

You can extract values inside array by referencing to them using index number (in this case [0]). And extracting values from maps is done by referencing keys (in this case ['']).

df2 = df.select('No', F.col('comp_value')[0][''].cast('string').alias('comp_value'))

df2.show()
# +---+----------+
# |No |comp_value|
# +---+----------+
# |  1|        10|
# |  2|        35|
# +---+----------+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 wwnde
Solution 2 ZygD