'reading a datetime column as stringType in pyspark and converting it to datetime giving null records
I am reading an API call which have date in datetime column in below format 2016-07-27T11:34:33Z+0000
Now I am creating a dataframe using defining custom schema
StructField("xyz",TimestampType(),True),
StructField("abc",TimestampType(),True)
Dataframe is getting created but when I am calling action its giving error.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 73040.0 failed 4 times, most recent failure: Lost task 6.3 in stage 73040.0 (TID 408627, 10.239.145.102, executor 12): org.apache.spark.api.python.PythonException: 'TypeError: field xyz: TimestampType can not accept object '2016-07-27T11:34:50Z+0000' in type <class 'str'>'. Full traceback below:
Tried: I have tried creating the dataframe using schema type as stringType for datetime column its working but when I am converting that to datetime its giving null values.
df_mod = df_mod.withColumn("xyz",df_mod['xyz'].cast(TimestampType()))
this is giving null values.
data when using stringtype while creating dataframe
Please help How I can create the dataframe with this format 2016-07-27T11:34:33Z+0000 having schema type to be timestamp.
Solution 1:[1]
You'd have to convert this string 2016-07-27T11:34:33Z+0000 to epoch time by unix_timestamp in order to cast it to TimestamptType. Tricky part is you have to pass a correct date format.
df = spark.createDataFrame([
(1, '2016-07-27T11:34:33Z+0000'),
], 's int, a string')
+---+-------------------------+
|s |a |
+---+-------------------------+
|1 |2016-07-27T11:34:33Z+0000|
+---+-------------------------+
import pyspark.sql.functions as F
(df
.withColumn('a', F.unix_timestamp('a', "yyyy-MM-dd'T'HH:mm:ss'Z'Z").cast('timestamp'))
.show()
)
+---+-------------------+
| s| a|
+---+-------------------+
| 1|2016-07-27 04:34:33|
+---+-------------------+
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | pltc |
