'PySpark generic function to convert Array<String> columns to strings
I've to use PySpark (I'm really not familiar with) to convert a dataframe containing columns with arrays of string. These columns and only these ones have to be converted to strings.
I would like to define a generic function taking a dataframe as argument instead of specifiying the column names.
Here the piece of code I've created
df = data_input
for name in names:
df = df.withColumn(name, concat_ws(",", col(name))
if isinstance(data_input.schema[name].dataType, ArrayType) and isinstance(data_input.schema[name].dataType.elementType, StringType)
else col(name))
Is there another optimized way to do it ?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
