'add double quotes at the start and end of each string of column pyspark
hello guyes im using pyspark 2.3
i have a dataframe with string column named "code_lei" i want to add double quotes at the start and end of each string in the column without deleting or changing the blanck space between the strings of the column
input dataframe :
+----+--------------+------+------------+-------+--------------------+
|vide| integer|double|xx_dt_arrete|vide_de| code_lei|
+----+--------------+------+------------+-------+--------------------+
|null|10000000000000| 1.1| 2021-06-30| null| code_lei et chorba|
|null|10000000000000| 1.1| 2021-06-30| null| null|
|null|10000000000000| 1.1| 2021-06-30| null| code_lei et chorba |
|null|10000000000000| 1.1| 2021-06-30| null| code_lei ee |
|null| 2| 2.2| null| null| code_lei|
|null| 2| 2.2| null| null| code_lei|
|null| 2| 2.2| null| null| code_lei|
|null| 2| 2.2| null| null| code_lei|
+----+--------------+------+------------+-------+--------------------+
Output Datafame :
+----+--------------+------+------------+-------+--------------------+
|vide| integer|double|xx_dt_arrete|vide_de| code_lei|
+----+--------------+------+------------+-------+--------------------+
|null|10000000000000| 1.1| 2021-06-30| null| "code_lei" "et" "chorba"|
|null|10000000000000| 1.1| 2021-06-30| null| null|
|null|10000000000000| 1.1| 2021-06-30| null| "code_lei" "et" "chorba" |
|null|10000000000000| 1.1| 2021-06-30| null| "code_lei" "ee" |
|null| 2| 2.2| null| null| "code_lei"|
|null| 2| 2.2| null| null| "code_lei"|
|null| 2| 2.2| null| null| "code_lei"|
|null| 2| 2.2| null| null| "code_lei"|
+----+--------------+------+------------+-------+--------------------+
Solution 1:[1]
You can use lit and concat functions for this purpose.
import pyspark.sql.functions as F
df.withColumn("code_lei",F.concat(F.lit('"'),F.col('code_lei'),F.lit('"'))).show()
Solution 2:[2]
Assuming you have a dataframe like this
df = spark.createDataFrame([
("hello there",),
("hello world",),
], ['text'])
+-----------+
| text|
+-----------+
|hello there|
|hello world|
+-----------+
You can then apply a chain of transformation like this
from pyspark.sql import functions as F
(df
.withColumn('splitted', F.split('text', ' '))
.withColumn('joined', F.array_join(F.col('splitted'), '" "'))
.withColumn('wrapped', F.concat(F.lit('"'), F.col('joined'), F.lit('"')))
.show()
)
+-----------+--------------+-------------+---------------+
| text| splitted| joined| wrapped|
+-----------+--------------+-------------+---------------+
|hello there|[hello, there]|hello" "there|"hello" "there"|
|hello world|[hello, world]|hello" "world|"hello" "world"|
+-----------+--------------+-------------+---------------+
Or you can add them all together like this
from pyspark.sql import functions as F
(df
.withColumn('wrapped', F.concat(F.lit('"'), F.array_join(F.split('text', ' '), '" "'), F.lit('"')))
.show()
)
+-----------+---------------+
| text| wrapped|
+-----------+---------------+
|hello there|"hello" "there"|
|hello world|"hello" "world"|
+-----------+---------------+
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Nidhi |
| Solution 2 | pltc |
