'Convert list of strings to array struct in pyspark

I have PySpark dataframe with one string data type like this:
'00639,43701,00007,00632,43701,00007'

I need to convert the above string into an array of structs using withColumn, to have this:

[{"network_id":"00639","network_bic":"43701","network_seqr":"00007"},{"network_id":"00632","network_bic":"43701","network_seqr":"00007"}]

How to achieve this using PySpark dataframes?



Solution 1:[1]

First, you may create an array out of your string, then access every element of that array using element_at, giving a name and putting them into a struct.

from pyspark.sql import functions as F
df = spark.createDataFrame([('00639,43701,00007,00632,43701,00007',)], ['col_str'])

col_split = F.split('col_str', ',')
df = df.withColumn('array_of_struct', F.array(
    F.struct(
        F.element_at(col_split, 1).alias('network_id'),
        F.element_at(col_split, 2).alias('network_bic'),
        F.element_at(col_split, 3).alias('network_seqr'),
    ),
    F.struct(
        F.element_at(col_split, 4).alias('network_id'),
        F.element_at(col_split, 5).alias('network_bic'),
        F.element_at(col_split, 6).alias('network_seqr'),
    )
))

df.show(truncate=0)
# +-----------------------------------+----------------------------------------------+
# |col_str                            |array_of_struct                               |
# +-----------------------------------+----------------------------------------------+
# |00639,43701,00007,00632,43701,00007|[{00639, 43701, 00007}, {00632, 43701, 00007}]|
# +-----------------------------------+----------------------------------------------+

df.printSchema()
# root
#  |-- col_str: string (nullable = true)
#  |-- array_of_struct: array (nullable = false)
#  |    |-- element: struct (containsNull = false)
#  |    |    |-- network_id: string (nullable = true)
#  |    |    |-- network_bic: string (nullable = true)
#  |    |    |-- network_seqr: string (nullable = true)

Solution 2:[2]

There is no exact string function available but you can use CONCAT as::

SELECT CONCAT('T.P.', ' ', 'Bar') as author;

+---------------------+
| author              |
+---------------------+
| T.P. Bar            |
+---------------------+

more references

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ZygD
Solution 2 rtenha