'Pandas on Spark is throwing Assertion Error when list of dictionaries is being tried as to add as a column

I am using pandas on spark (spark version - 3.2.1) to scale my pandas df. I am using it like this

import pyspark.pandas as ps
sdf = ps.read_csv(r"C:\Users\Downloads\test1.csv",sep=',')

I have a list of dictionaries like this

tid = [{'ID_ID_dm001_ID_ID': '1', 'ID_ID_vs001_ID_ID': '1'},{'ID_ID_dm001_ID_ID': '2', 'ID_ID_vs001_ID_ID': '2'},{'ID_ID_dm001_ID_ID': '3', 'ID_ID_vs001_ID_ID': '3'},
{'ID_ID_dm001_ID_ID': '4', 'ID_ID_vs001_ID_ID': '4'},{'ID_ID_dm001_ID_ID': '5', 'ID_ID_vs001_ID_ID': '5'}]

I want to add this list as a separate column to my dataframe - sdf as follows:

sdf['TRACEID'] = tid

However this throws an AssertionError. This works for pandas dataframe but not for pandas on spark df.

Any workaround here ? Thanks in advance.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Pandas on Spark is throwing Assertion Error when list of dictionaries is being tried as to add as a column

Sources

Related Questions