'How to split csv comma separated value as single row in a new column using pyspark

I have a log file in csv which has a column contains a list of filepaths separated by comma. I want to split those filepaths into new rows using pyspark(or excel). This original data looks like:

+----------+----------------------------------------------------------------------------+
|time      |message                                                                     |
+----------+----------------------------------------------------------------------------+
|4-19 20:00|[info] Delete object in ['03-26/abc/123.jpg', '03-26/abc/456.jpg']          |
+----------+----------------------------------------------------------------------------+
|4-19 21:00|[info] Delete object in ['03-27/def/789.jpg', '03-27/def/012.jpg']          |
+----------+----------------------------------------------------------------------------+

I'd like it to be converted as this:

+-----------------+
|path             |
+-----------------+
|03-26/abc/123.jpg|
+-----------------+
|03-26/abc/456.jpg|
+-----------------+
|03-27/def/789.jpg|
+-----------------+
|03-27/def/012.jpg|
+-----------------+


Solution 1:[1]

Just extract those paths from message and parse it

from pyspark.sql import functions as F

(df
    .withColumn('paths', F.explode(F.from_json(F.regexp_extract('message', '\[\'[^\]]+]', 0), 'array<string>')))
    .show(10, False)
)

+----------+------------------------------------------------------------------+-----------------+
|time      |message                                                           |paths            |
+----------+------------------------------------------------------------------+-----------------+
|4-19 20:00|[info] Delete object in ['03-26/abc/123.jpg', '03-26/abc/456.jpg']|03-26/abc/123.jpg|
|4-19 20:00|[info] Delete object in ['03-26/abc/123.jpg', '03-26/abc/456.jpg']|03-26/abc/456.jpg|
|4-19 21:00|[info] Delete object in ['03-27/def/789.jpg', '03-27/def/012.jpg']|03-27/def/789.jpg|
|4-19 21:00|[info] Delete object in ['03-27/def/789.jpg', '03-27/def/012.jpg']|03-27/def/012.jpg|
+----------+------------------------------------------------------------------+-----------------+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 pltc