'to_timestamp() function in python giving null value in return

I have a data set as follows in a CSV format

ORDER_ID ORDER_TIME
8799 11/11/2008 01:34:16.564000 AM
8854 4/18/2008 01:02:36.564000 AM
8884 4/20/2008 10:38:44.886000 PM
8899 12/16/2008 07:32:59.456000 AM
8681 7/29/2008 08:59:06.250000 PM

and we are reading the file with the following read function in python

from pyspark.sql.types import *
from pyspark.sql import functions as F
df = spark.read \
  .option("header",True) \
  .option("nullValue", "null") \
  .option("delimiter",",") \
  .option("multiLine",True) \
  .csv( csvfile, encoding="utf-8")

df = df.withColumn("ORDER_TIME", F.to_timestamp(F.unix_timestamp("ORDER_TIME", 'M/d/yyyy hh:mm:ss.SSSSSS a').cast('timestamp')))

When we run df.show() for the column ORDER_TIME we are getting null value
but I need it in standard spark format which is 2008-11-11 01:34:16

The same command if I run it in pyspark terminal the output is proper for the same input.

All our packages are up to date and I have no idea about this issue. Looking forward to the solution



Solution 1:[1]

approach by using udf:

import datetime
from pyspark.sql.functions import col

df = spark.createDataFrame(
    [(8854, "11/11/2008 01:34:16.564000 AM"), (8799, "4/18/2008 01:02:36.564000 AM")], ("ORDER_ID", "ORDER_TIME"))

def standard_date_format(date):
  val = datetime.datetime.strptime(date, '%m/%d/%Y %H:%M:%S.%f %p').strftime('%Y-%m-%d %H:%M:%S')
  return val

fn1 = udf(standard_date_format)

df =df.withColumn('ORDER_TIME',fn1(col('ORDER_TIME')))
display(df)

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Sudhin