'PySpark UDF Output Schema
I'm trying to format the output of a PySpark UDF. The python code in the UDF returns something like this:
[ [[1],[.5,.6,.7],"A","B"], [[2],[.1,.3,.9],"A","C"],... ]
I have the following return schema code:
schema_return = st.StructType([
st.StructField('result', st.StructType([
st.StructField('rank', st.ArrayType(st.FloatType()),True), \
st.StructField('embedding', st.ArrayType(st.FloatType()),True), \
st.StructField('name', st.StringType(),True), \
st.StructField('value', st.StringType(),True)
])),
])
However this is giving me the following error which I found on here corresponds to wrong output type:
net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)
Can someone help me write this schema correctly?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
