'Explode Array Element into a unique column
I'm new to Pyspark and trying to solve an ETL step.
I have the following schema below. I would like to take the variable that is inside the array and transform it into a column, but when doing this with explode I create duplicate rows because there are positions [0], [1], and [2] inside the element.
My goal is to transform what is inside variable into a new column taking everything that is in the element (separating by comma what was in each element) and transforming it into a string.
root
|-- id: string (nullable = true)
|-- info: array (nullable = true)
| |-- element: struct (containsNull = false)
| | |-- variable: string (nullable = true)
Output:
| id | new column |
|---|---|
| 123435e5x-9a9z | A, B, D |
| 555585a4Z-0B1Y | A |
Thank you for the help
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
