'How to convert timestamp string to date in Databricks sql?
I know this has been asked several times and I consulted/searched the offered questions and answers. I also read up the databricks docu and had several attempts, but I just don't get the desired result.
Given:
+----------------------------+
| data_type |
+----------------------------+
| timestamp |
+----------------------------+
Ex:
+------------------------------+
| data_value |
+------------------------------+
| 2017-11-22T00:00:00.000+0000 |
+------------------------------+
Desired outcome:
+----------------------------+
| data_value |
+----------------------------+
| 22.11.2017 |
+----------------------------+
What I tried and failed so far:
date_format(date_value, 'dd.mm.yyyy') AS MFGDate,
to_date(date_value) AS MFGDate,
date(date_value) AS MFGDate
Result:
+-------------------------+------------+
| MFGDate | MFGDate | MFGDate |
+------------+------------+------------+
| 22.00.2017 | 2017-11-22 | 2017-11-22 |
+------------+------------+------------+
Here's the full query:
SELECT
'01 FUV' AS Stage,
d1.ps_name AS FUV,
d1.ps_name AS LOT,
d2.date_value AS MFGDate
FROM
table d1
INNER JOIN table d2 ON d1.ag_id = d2.ag_id
AND d1.ag_path = d2.ag_path
AND d1.ps_name = d2.ps_name
WHERE
d1.AG_PATH LIKE "sourcepath'
Result:
+--------+--------+--------+------------------------------+
| Stage | FUV | Lot | MFGDate |
+--------+--------+--------+------------------------------+
| 01 FUV | A1U079 | A1U079 | 2019-03-27T00:00:00.000+0000 |
| 01 FUV | A1U255 | A1U255 | 2019-06-22T00:00:00.000+0000 |
| 01 FUV | A1U255 | A1U255 | 2019-11-10T00:00:00.000+0000 |
How to get the value for column MFGDate in the format like: 22.11.2017 ?
Solution 1:[1]
You can use the built in function - date_format , the thing you were missing was the correct Symbol usage . The link explains the symbol usage
Typical Usage
input_list = [
(1,"2019-11-07 05:30:00")
,(2,"2019-07-09 15:30:00")
,(3,"2019-12-09 10:30:00")
,(4,"2019-02-11 14:30:00")
]
sparkDF = sql.createDataFrame(input_list,['id','date'])
sparkDF = sparkDF.withColumn('date',F.to_timestamp(F.col('date'), 'yyyy-MM-dd HH:mm:ss'))
sparkDF = sparkDF.withColumn('date_formated',F.date_format(F.col('date'), 'dd.MM.yyyy'))
sparkDF.show()
+---+-------------------+-------------+
| id| date|date_formated|
+---+-------------------+-------------+
| 1|2019-11-07 05:30:00| 07.11.2019|
| 2|2019-07-09 15:30:00| 09.07.2019|
| 3|2019-12-09 10:30:00| 09.12.2019|
| 4|2019-02-11 14:30:00| 11.02.2019|
+---+-------------------+-------------+
Solution 2:[2]
You were very close. You can use the built in function - date_format, but the reason you were getting "00" returned for the month is because you had your format incorrect. You specified "mm" which returns minutes of the hour; you should have specified "MM" which returns month of the year. So correct code is:
date_format(date_value, 'dd.MM.yyyy') AS MFGDate
Documentation here: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Vaebhav |
| Solution 2 | David Findlay |
