'PySpark generic function to convert Array<String> columns to strings

I've to use PySpark (I'm really not familiar with) to convert a dataframe containing columns with arrays of string. These columns and only these ones have to be converted to strings.

I would like to define a generic function taking a dataframe as argument instead of specifiying the column names.

Here the piece of code I've created

df = data_input
for name in names:
    df = df.withColumn(name, concat_ws(",", col(name)) 
        if isinstance(data_input.schema[name].dataType, ArrayType) and isinstance(data_input.schema[name].dataType.elementType, StringType) 
        else col(name))

Is there another optimized way to do it ?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source