'PySpark UDF Output Schema

I'm trying to format the output of a PySpark UDF. The python code in the UDF returns something like this:

 [ [[1],[.5,.6,.7],"A","B"], [[2],[.1,.3,.9],"A","C"],... ]

I have the following return schema code:

schema_return = st.StructType([
        st.StructField('result', st.StructType([
        st.StructField('rank', st.ArrayType(st.FloatType()),True), \
        st.StructField('embedding', st.ArrayType(st.FloatType()),True), \
        st.StructField('name', st.StringType(),True), \
        st.StructField('value', st.StringType(),True)
             ])),
         ])

However this is giving me the following error which I found on here corresponds to wrong output type:

net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)

Can someone help me write this schema correctly?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source