'How to convert timestamp string to date in Databricks sql?

I know this has been asked several times and I consulted/searched the offered questions and answers. I also read up the databricks docu and had several attempts, but I just don't get the desired result.

Given:

+----------------------------+
|        data_type           |
+----------------------------+
|        timestamp           |
+----------------------------+

Ex:

+------------------------------+
|        data_value            |
+------------------------------+
| 2017-11-22T00:00:00.000+0000 |
+------------------------------+

Desired outcome:

+----------------------------+
|        data_value          |
+----------------------------+
|        22.11.2017          |
+----------------------------+

What I tried and failed so far:

  date_format(date_value, 'dd.mm.yyyy') AS MFGDate,

  to_date(date_value) AS MFGDate,

  date(date_value) AS MFGDate

Result:

+-------------------------+------------+
|   MFGDate  |   MFGDate  |   MFGDate  |
+------------+------------+------------+
| 22.00.2017 | 2017-11-22 | 2017-11-22 |
+------------+------------+------------+

Here's the full query:

SELECT
   '01 FUV' AS Stage,
   d1.ps_name AS FUV,
   d1.ps_name AS LOT,
   d2.date_value AS MFGDate
 FROM
  table d1
  INNER JOIN table d2 ON d1.ag_id = d2.ag_id
  AND d1.ag_path = d2.ag_path
  AND d1.ps_name = d2.ps_name
WHERE
  d1.AG_PATH LIKE "sourcepath'

Result:

+--------+--------+--------+------------------------------+
| Stage  | FUV    | Lot    | MFGDate                      | 
+--------+--------+--------+------------------------------+
| 01 FUV | A1U079 | A1U079 | 2019-03-27T00:00:00.000+0000 |
| 01 FUV | A1U255 | A1U255 | 2019-06-22T00:00:00.000+0000 |
| 01 FUV | A1U255 | A1U255 | 2019-11-10T00:00:00.000+0000 |

How to get the value for column MFGDate in the format like: 22.11.2017 ?



Solution 1:[1]

You can use the built in function - date_format , the thing you were missing was the correct Symbol usage . The link explains the symbol usage

Typical Usage

input_list = [
  (1,"2019-11-07 05:30:00")
  ,(2,"2019-07-09 15:30:00")
  ,(3,"2019-12-09 10:30:00")
  ,(4,"2019-02-11 14:30:00")
]


sparkDF = sql.createDataFrame(input_list,['id','date'])

sparkDF = sparkDF.withColumn('date',F.to_timestamp(F.col('date'), 'yyyy-MM-dd HH:mm:ss'))

sparkDF = sparkDF.withColumn('date_formated',F.date_format(F.col('date'), 'dd.MM.yyyy'))

sparkDF.show()

+---+-------------------+-------------+
| id|               date|date_formated|
+---+-------------------+-------------+
|  1|2019-11-07 05:30:00|   07.11.2019|
|  2|2019-07-09 15:30:00|   09.07.2019|
|  3|2019-12-09 10:30:00|   09.12.2019|
|  4|2019-02-11 14:30:00|   11.02.2019|
+---+-------------------+-------------+

Solution 2:[2]

You were very close. You can use the built in function - date_format, but the reason you were getting "00" returned for the month is because you had your format incorrect. You specified "mm" which returns minutes of the hour; you should have specified "MM" which returns month of the year. So correct code is:

date_format(date_value, 'dd.MM.yyyy') AS MFGDate

Documentation here: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Vaebhav
Solution 2 David Findlay