'Pandas on Spark is throwing Assertion Error when list of dictionaries is being tried as to add as a column
I am using pandas on spark (spark version - 3.2.1) to scale my pandas df. I am using it like this
import pyspark.pandas as ps
sdf = ps.read_csv(r"C:\Users\Downloads\test1.csv",sep=',')
I have a list of dictionaries like this
tid = [{'ID_ID_dm001_ID_ID': '1', 'ID_ID_vs001_ID_ID': '1'},{'ID_ID_dm001_ID_ID': '2', 'ID_ID_vs001_ID_ID': '2'},{'ID_ID_dm001_ID_ID': '3', 'ID_ID_vs001_ID_ID': '3'},
{'ID_ID_dm001_ID_ID': '4', 'ID_ID_vs001_ID_ID': '4'},{'ID_ID_dm001_ID_ID': '5', 'ID_ID_vs001_ID_ID': '5'}]
I want to add this list as a separate column to my dataframe - sdf as follows:
sdf['TRACEID'] = tid
However this throws an AssertionError. This works for pandas dataframe but not for pandas on spark df.
Any workaround here ? Thanks in advance.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
