'How to preserve a list type when calling that udf in pyspark?

I have a pyspark UDF which returns me a list of weeks. yw_list contains a list of weeks like 202001, 202002, ....202048 etc..

    def Week_generator(week, no_of_weeks):
        end_index = yw_list.index(week) 
        start_index = end_index - no_of_weeks + 1
        return(yw_list[start_index:end_index+1])
   
    spark.udf.register("Week_generator", Week_generator)

When I'm calling this UDF in my spark sql dataframe, instead of storing the result as a list, it is getting stored as a string. Because of this I'm not able to iterate over the values in the list.

spark.sql(""" select Week_generator('some week column', 4) as col1 from xyz""")
Output Schema:  col1:String

Any idea or suggestion on how to resolve this ?



Solution 1:[1]

As pointed out by Suresh, I missed out adding the datatype.

spark.udf.register("Week_generator", Week_generator,ArrayType(StringType()))

This solved my issue.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Akaza