'Extract value from complex array of map type to string
I have a dataframe like below.
No comp_value
1 [[ -> 10]]
2 [[ -> 35]]
The schema type of column - value is.
comp_value: array (nullable = true)
element: map(containsNull = true)
key: string
value: long (valueContainsNull = true)
I would like to convert the comp_value from complex type to string using PySpark. Is there a way to achieve this?
Expected output:
No comp_value
1 10
2 35
Solution 1:[1]
from pyspark.sql import functions as F
df = spark.createDataFrame(
[(1,' [[ -> 10]]'),
(2, '[[ -> 35]]')],
['No', 'v'])
df.show()
replace the corner brackets, remove trailing spaces, split by space to get a list and get the elements you want by slicing the list
new = df.withColumn('comp_value', split(trim(regexp_replace('v','\[|\]','')),'\s')[1])
new.show()
+---+-----------+----------+
| No| v|comp_value|
+---+-----------+----------+
| 1| [[ -> 10]]| 10|
| 2| [[ -> 35]]| 35|
+---+-----------+----------+
Solution 2:[2]
I will assume your data looks like this:
from pyspark.sql import functions as F
df = spark.createDataFrame(
[(1, 10),
(2, 35)],
['No', 'v'])
df = df.select('No', F.array(F.create_map(F.lit(''), 'v')).alias('comp_value'))
df.show()
# +---+----------+
# |No |comp_value|
# +---+----------+
# |1 |[{ -> 10}]|
# |2 |[{ -> 35}]|
# +---+----------+
You can extract values inside array by referencing to them using index number (in this case [0]). And extracting values from maps is done by referencing keys (in this case ['']).
df2 = df.select('No', F.col('comp_value')[0][''].cast('string').alias('comp_value'))
df2.show()
# +---+----------+
# |No |comp_value|
# +---+----------+
# | 1| 10|
# | 2| 35|
# +---+----------+
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | wwnde |
| Solution 2 | ZygD |
