'pyspark trim and split
I am trying to create an ArrayType from an StringType but I am unable to do a trim and split at the same time.
Create DataFrame
data = [
(
"1",
"Example 1",
"element1 | element two | element (three) ",
),
]
schema = StructType(
[
StructField("id", StringType(), True),
StructField("name", StringType(), True),
StructField("value", StringType(), True),
]
)
df = spark.createDataFrame(data, schema)
What I try to do
df = (df.withColumn("value_array",
f.trim(f.split(f.col("value"), "\|")
)
)
)
The error that I get
AnalysisException: cannot resolve 'trim(split(`value`, '\\|', -1))' due to data type mismatch: argument 1 requires string type, however, 'split(`value`, '\\|', -1)' is of array<string> type.;
Code that works but is not complete
df = (df.withColumn("value_arry",
f.split(f.col("value"), "\|")
)
)
The result I get
"[e1| e 2 | e3 ]" #These values contains spaces in
+---+---------+---------------+-------------------+
| id| name| value| value_arry|
+---+---------+---------------+-------------------+
| 1|Example 1|e1| e 2 | e3 3 |[e1, e 2 , e3 3 ]|
| 2|Example 2| e1 | e2 | [e1 , e2 ]|
+---+---------+---------------+-------------------+
The result I would like to get would have each element without spaces
Spark Version 3.2.1
Solution 1:[1]
You can use transform function after split:
from pyspark.sql import functions as F
df = df.withColumn("value_arry", F.expr("transform(split(value, '[|]'), x -> trim(x))"))
df.show(truncate=False)
#+---+---------+-----------------------------------------+----------------------------------------+
#|id |name |value |value_arry |
#+---+---------+-----------------------------------------+----------------------------------------+
#|1 |Example 1|element1 | element two | element (three) |[element1, element two, element (three)]|
#+---+---------+-----------------------------------------+----------------------------------------+
Or split using regex \s+\|\s+:
df = df.withColumn("value_arry", F.split("value", r"\s+\|\s+"))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | blackbishop |
