'How to preserve a list type when calling that udf in pyspark?
I have a pyspark UDF which returns me a list of weeks. yw_list contains a list of weeks like 202001, 202002, ....202048 etc..
def Week_generator(week, no_of_weeks):
end_index = yw_list.index(week)
start_index = end_index - no_of_weeks + 1
return(yw_list[start_index:end_index+1])
spark.udf.register("Week_generator", Week_generator)
When I'm calling this UDF in my spark sql dataframe, instead of storing the result as a list, it is getting stored as a string. Because of this I'm not able to iterate over the values in the list.
spark.sql(""" select Week_generator('some week column', 4) as col1 from xyz""")
Output Schema: col1:String
Any idea or suggestion on how to resolve this ?
Solution 1:[1]
As pointed out by Suresh, I missed out adding the datatype.
spark.udf.register("Week_generator", Week_generator,ArrayType(StringType()))
This solved my issue.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Akaza |
