'error with adding new list typed column using pyspark.sql.functions.arry

i get error when i tried to add new array typed column using the pyspark.sql.functions.arry

      column = ['id', 'fname', 'age', 'avg_wh']
        data= [('1', 'user_1', '40', 8.5),
                ('2', 'user_2', '6', 1.5),
                ('3', 'user_3', '4', 5.5),
                ('10', 'user_10', '4', 2.5)]

        from pyspark.sql import functions as F

        df = spark.createDataFrame(data,column)
        df.withColumn("lsitColumn" ,F.array(["1","2","3"]))
        df.show()

the Error raise_from(converted) File "<string>", line 3, in raise_from pyspark.sql.utils.AnalysisException: cannot resolve '1' given input columns: [age, avg_wh, fname, id];; 'Project [id#0, fname#1, age#2, avg_wh#3, array('1, '2, '3) AS lsitColumn#8] +- LogicalRDD [id#0, fname#1, age#2, avg_wh#3], false

could you please assist what is the roote cause for this error , i managed to create the column by using UDF but i don't understand why this basic failed

the UDF

       extract = f.udf(lambda x: list(["1","2","3"]), ArrayType(StringType()))
       percentielDF = df.withColumn("lsitColumn", extract("id"))

i expected to get new DF with list typed column and i get error



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source