'not able to add a spark scala column and add a tuple data

Below is thedata which needs to be populated into a dataframe

    val columnNames  = Array("ID", "Name","Age")
    val d1 = Array("QWER","TOM","28")
    val d2 = Array(  "SPSRT","BENJI","45")
    val d1zip = columnNames.zip(d1)
    val d2zip1 = columnNames.zip(d2)
    val data = Array(d1zip, d2zip1) 

    import org.apache.spark.sql.{DataFrame} 
    import org.apache.spark.sql.functions._
    var df = generateDF1( columnNames  , data) 

Function to generate the dataframe

    def generateDF1(colnames : Array[String], data :  Array[Array[(String, String)]]) : 
    DataFrame = {

    import spark.implicits._
    var initDF = spark.emptyDataFrame
    for ( itemlist <- data ) {
       for ( item <- itemlist) {
        initDF = initDF.withColumn(item._1, lit(item._2))
    }
    }
    initDF
    }
    }

Not able to see any data being added to dataframe



Solution 1:[1]

if d1 and d2 where actual tuples (and not Arrays as in your code) you could:

val columnNames  = Array("ID", "Name","Age")
val d1 =("QWER","TOM","28")
val d2 =("SPSRT","BENJI","45")
val data=Seq(d1,d2).toDF(columnNames:_*)

What you are doing is adding the same columns multiple times and each time setting all values to a constant.

The withColumn adds a column to all rows in a dataframe - but since you have an empty dataframe you don't actually add anything

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1