'Remove milliseconds from timestamp pyspark SQL
I have two fields, both in string type. One field is called date, the other time.
The date field has values like below:
20220328,20220329,20220330,20220331
The timefield has values like below:
043313,045546,043313,044147
What I need is for these values to be converted to timestamp datatype and thus should look something like this:
2022-03-28T04:33:13
2022-03-29T04:55:46
2022-03-30T04:33:13
2022-03-28T04:41:47
I have the following code:
to_timestamp(concat(to_date(table.date,'yyyyMMdd'),COALESCE(nullif(table.time, ''),'000000')) , 'yyyy-MM-ddHHmmss')
which works fine, but returns the timestamp like below:
2022-03-28T04:33:13.000+0000
Does anyone know how I can remove the trailing 000+0000 so that I would only keep 2022-03-28T04:33:13?
Kind regards
Solution 1:[1]
You want to format the timestamp, you can use date_format function for that:
spark.sql("""
select date_format(
to_timestamp(
concat(table.date, coalesce(nullif(table.time, ''), '000000')),
'yyyyMMddHHmmss'
),
"yyyy-MM-dd'T'HH:mm:ss"
) as new_date
from values ('20220328', '043313'), ('20220329', '045546'), ('20220330', '043313'), ('20220331', '044147') as table (date, time)
""").show()
#+-------------------+
#| new_date|
#+-------------------+
#|2022-03-28T04:33:13|
#|2022-03-29T04:55:46|
#|2022-03-30T04:33:13|
#|2022-03-31T04:41:47|
#+-------------------+
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | blackbishop |
