'error with adding new list typed column using pyspark.sql.functions.arry
i get error when i tried to add new array typed column using the pyspark.sql.functions.arry
column = ['id', 'fname', 'age', 'avg_wh']
data= [('1', 'user_1', '40', 8.5),
('2', 'user_2', '6', 1.5),
('3', 'user_3', '4', 5.5),
('10', 'user_10', '4', 2.5)]
from pyspark.sql import functions as F
df = spark.createDataFrame(data,column)
df.withColumn("lsitColumn" ,F.array(["1","2","3"]))
df.show()
the Error
raise_from(converted) File "<string>", line 3, in raise_from pyspark.sql.utils.AnalysisException: cannot resolve '1' given input columns: [age, avg_wh, fname, id];; 'Project [id#0, fname#1, age#2, avg_wh#3, array('1, '2, '3) AS lsitColumn#8] +- LogicalRDD [id#0, fname#1, age#2, avg_wh#3], false
could you please assist what is the roote cause for this error , i managed to create the column by using UDF but i don't understand why this basic failed
the UDF
extract = f.udf(lambda x: list(["1","2","3"]), ArrayType(StringType()))
percentielDF = df.withColumn("lsitColumn", extract("id"))
i expected to get new DF with list typed column and i get error
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
