'Spark SQL change date format

I'm trying to change the date format of my data from (11 20, 2014) to 2014-11-20. I tried this:

df.withColumn("newDate", to_date(col("reviewTime"),("mm dd, yyyy")))

Because the days with single digits appear as 1,2,8 instead of 01,02,08 I got this message:

SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '09 1, 2014' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string. Caused by: DateTimeParseException: Text '09 1, 2014' could not be parsed at index 3

Is there a way to fix this?

Thanks!



Solution 1:[1]

Some of your data date rows were written in an old spark version. You should add your spark configuration.

spark.sql.parquet.int96RebaseModeInRead = "LEGACY"

or

spark.sql.parquet.int96RebaseModeInRead = "CORECTED"

according to your requirements, they explain to you the differences between those two options in your error.

Solution 2:[2]

You can use format (M d, yyyy) to deal with it

Example (scala spark):

Seq(
    "(11 20, 2014)",
    "(1 3, 2013)",
    "(2 20, 2012)",
    "(4 22, 2014)"
).toDF("ugly_date")
        .withColumn("date", to_date($"ugly_date", "(M d, yyyy)"))
        .show(false)

Output:
+-------------+----------+
|ugly_date    |date      |
+-------------+----------+
|(11 20, 2014)|2014-11-20|
|(1 3, 2013)  |2013-01-03|
|(2 20, 2012) |2012-02-20|
|(4 22, 2014) |2014-04-22|
+-------------+----------+

For more information about Datetime Patterns see https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

UPD: Screenshot with the result

  • Spark: 3.1.2
  • Scala: 2.12.10
  • Running on Zeppelin 0.9.0

screenshot

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tal
Solution 2