'Scala Dataframe.columns - Order of columns

How to get the columns in the same order as they are in the dataframe?

val df = Seq(("1", "2", "a"), ("3", "4", "b"), ("5", "6", "d")).toDF("col1", "col2", "col3")

df.printSchema()
df: org.apache.spark.sql.DataFrame = [col1: string, col2: string ... 1 more field]
root
 |-- col1: string (nullable = true)
 |-- col2: string (nullable = true)
 |-- col3: string (nullable = true)


df.columns
Array[String] = Array(col1, col2, col3)

dataset.columns gives the columns and they seem to be in the same order but I could not find any documentation as to this will be the case always. Can we rely on this method to get the columns in the same order as they are in dataset?



Solution 1:[1]

Can we rely on this method to get the columns in the same order as they are in dataset?

Yes


The definition of def columns is:

  /**
   * Returns all column names as an array.
   *
   * @group basic
   * @since 1.6.0
   */
  def columns: Array[String] = schema.fields.map(_.name)

Which accesses, schema which is of type StructType and contains field:

case class StructType(fields: Array[StructField])

Since we are relying on Array, here; a class from the standard library, and using it's fundamental map method, we can indeed rely on the ordering.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 tjheslin1