'Scala: How can I replace value in Dataframes using scala

For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks

Edit:

|year| make|model| comment            |blank|
|2012|Tesla| S   | No comment         |     | 
|1997| Ford| E350|Go get one now th...|     | 
|2015|Chevy| Volt| null               | null| 

This is my Dataframe I'm trying to change Tesla in make column to S



Solution 1:[1]

Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:

dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")
                             .otherwise(col("make") 
                    );

Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.

Solution 2:[2]

Building off of the solution from @Azeroth2b. If you want to replace only a couple of items and leave the rest unchanged. Do the following. Without using the otherwise(...) method, the remainder of the column becomes null.

import org.apache.spark.sql.functions._

val newsdf =
  sdf.withColumn(
    "make",
    when(col("make") === "Tesla", "S").otherwise(col("make"))
  );

Old DataFrame

+-----+-----+ 
| make|model| 
+-----+-----+ 
|Tesla|    S| 
| Ford| E350| 
|Chevy| Volt| 
+-----+-----+ 

New Datarame

+-----+-----+
| make|model|
+-----+-----+
|    S|    S|
| Ford| E350|
|Chevy| Volt|
+-----+-----+

Solution 3:[3]

This can be achieved in dataframes with user defined functions (udf).

import org.apache.spark.sql.functions._
val sqlcont = new org.apache.spark.sql.SQLContext(sc)
val df1 = sqlcont.jsonRDD(sc.parallelize(Array(
      """{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",
      """{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",
      """{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""
    )))

val makeSIfTesla = udf {(make: String) => 
  if(make == "Tesla") "S" else make
}
df1.withColumn("make", makeSIfTesla(df1("make"))).show

Solution 4:[4]

df2.na.replace("Name",Map("John" -> "Akshay","Cindy" -> "Jayita")).show()

replace in class DataFrameNaFunctions of type [T](col: String, replacement: Map[T,T])org.apache.spark.sql.DataFrame

For running this function you must have active spark object and dataframe with headers ON.

Solution 5:[5]

import org.apache.spark.sql.functions._

val base_optin_email = spark.read.option("header","true").option("delimiter",",").schema(schema_base_optin).csv(file_optin_email).where("CPF IS NOT NULL").
        withColumn("CARD_KEY",  lit(translate( translate(col("cpf"), ".", ""),"-","")))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ChrisOdney
Solution 2 Xavier Guihot
Solution 3 Al M
Solution 4 Akshay Pandya
Solution 5 Juan Diego Lozano