'reading a datetime column as stringType in pyspark and converting it to datetime giving null records

I am reading an API call which have date in datetime column in below format 2016-07-27T11:34:33Z+0000

Now I am creating a dataframe using defining custom schema

StructField("xyz",TimestampType(),True),  
StructField("abc",TimestampType(),True)

Dataframe is getting created but when I am calling action its giving error.

org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 73040.0 failed 4 times, most recent failure: Lost task 6.3 in stage 73040.0 (TID 408627, 10.239.145.102, executor 12): org.apache.spark.api.python.PythonException: 'TypeError: field xyz: TimestampType can not accept object '2016-07-27T11:34:50Z+0000' in type <class 'str'>'. Full traceback below:

Tried: I have tried creating the dataframe using schema type as stringType for datetime column its working but when I am converting that to datetime its giving null values.

df_mod = df_mod.withColumn("xyz",df_mod['xyz'].cast(TimestampType()))

this is giving null values.

data when using stringtype while creating dataframe

Please help How I can create the dataframe with this format 2016-07-27T11:34:33Z+0000 having schema type to be timestamp.



Solution 1:[1]

You'd have to convert this string 2016-07-27T11:34:33Z+0000 to epoch time by unix_timestamp in order to cast it to TimestamptType. Tricky part is you have to pass a correct date format.

df = spark.createDataFrame([
    (1, '2016-07-27T11:34:33Z+0000'),
], 's int, a string')
+---+-------------------------+
|s  |a                        |
+---+-------------------------+
|1  |2016-07-27T11:34:33Z+0000|
+---+-------------------------+

import pyspark.sql.functions as F

(df
    .withColumn('a', F.unix_timestamp('a', "yyyy-MM-dd'T'HH:mm:ss'Z'Z").cast('timestamp'))
    .show()
)
+---+-------------------+
|  s|                  a|
+---+-------------------+
|  1|2016-07-27 04:34:33|
+---+-------------------+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 pltc