'not able to add a spark scala column and add a tuple data
Below is thedata which needs to be populated into a dataframe
val columnNames = Array("ID", "Name","Age")
val d1 = Array("QWER","TOM","28")
val d2 = Array( "SPSRT","BENJI","45")
val d1zip = columnNames.zip(d1)
val d2zip1 = columnNames.zip(d2)
val data = Array(d1zip, d2zip1)
import org.apache.spark.sql.{DataFrame}
import org.apache.spark.sql.functions._
var df = generateDF1( columnNames , data)
Function to generate the dataframe
def generateDF1(colnames : Array[String], data : Array[Array[(String, String)]]) :
DataFrame = {
import spark.implicits._
var initDF = spark.emptyDataFrame
for ( itemlist <- data ) {
for ( item <- itemlist) {
initDF = initDF.withColumn(item._1, lit(item._2))
}
}
initDF
}
}
Not able to see any data being added to dataframe
Solution 1:[1]
if d1 and d2 where actual tuples (and not Arrays as in your code) you could:
val columnNames = Array("ID", "Name","Age")
val d1 =("QWER","TOM","28")
val d2 =("SPSRT","BENJI","45")
val data=Seq(d1,d2).toDF(columnNames:_*)
What you are doing is adding the same columns multiple times and each time setting all values to a constant.
The withColumn adds a column to all rows in a dataframe - but since you have an empty dataframe you don't actually add anything
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
