'different date format in pyspark dataframe

**How to deal with different types of date formats in pyspark df. I am getting Null Values.

data=[["1","02-10-2020"],["2","03-15-2019"],["3","04-05-2021"], ['4', '02/19/2021'], ['5', '01/25/2022']]
df=spark.createDataFrame(data,["id","Date"])
df.show()
df.printSchema()

+---+----------+
| id|      Date|
+---+----------+
|  1|02-10-2020|
|  2|03-15-2019|
|  3|04-05-2021|
|  4|02/19/2021|
|  5|01/25/2022|
+---+----------+
root
 |-- id: string (nullable = true)
 |-- Date: string (nullable = true)

I tried this way getting null instead of day

df.select('Date', to_date(col('Date'), 'MM-dd-yyyy').alias('New_date')).show()
+----------+----------+
|      Date|  New_date|
+----------+----------+
|02-10-2020|2020-02-10|
|03-15-2019|2019-03-15|
|04-05-2021|2021-04-05|
|02/19/2021|      null|
|01/25/2022|      null|
+----------+----------+

OUTPUT I needed:

+----------+----------+
|      Date|  New_date|
+----------+----------+
|02-10-2020|2020-02-10|
|03-15-2019|2019-03-15|
|04-05-2021|2021-04-05|
|02/19/2021|2021-02-19|
|01/25/2022|2022-01-25|
+----------+----------+


Solution 1:[1]

You have 2 different formats in your data. So you need 2 different process :

from pyspark.sql import functions as F

df.select(
    "Date",
    F.coalesce(
        F.to_date(F.col("Date"), "MM-dd-yyyy"),
        F.to_date(F.col("Date"), "MM/dd/yyyy"),
    ).alias("new_date"),
).show()

You can also replace the / in your strings with -.

Solution 2:[2]

In addition to @Steven's answer you could also do something as below -

from pyspark.sql.functions import *

df1 = df.withColumn("New_date", to_date(regexp_replace(col("Date"), "/", "-"), "MM-dd-yyyy"))#.drop("Date")

df1.show()

Output -

+---+----------+----------+
| id|      Date|  New_date|
+---+----------+----------+
|  1|02-10-2020|2020-02-10|
|  2|03-15-2019|2019-03-15|
|  3|04-05-2021|2021-04-05|
|  4|02/19/2021|2021-02-19|
|  5|01/25/2022|2022-01-25|
+---+----------+----------+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Steven
Solution 2 DKNY