'A DATETIME column in Synapse tables is loading date values that are a few hours into the past compared to the incoming value
I have a datetime column in Synapse called "load_day" which is being loaded through a pyspark dataframe (parquet). During runtime, the code adds a new column in the dataframe with an incoming date ('timestamp') of format yyyy-mm-dd hh:mm:ss into the dataframe.
df = df.select(lit(incoming_date).alias("load_day"), "*")
Later we are writing this dataframe into a synapse table using a df.write command.
But what's strange is that every date value that is going into this load_day column is being written as a value that is a few hours into the past. This is happening with all the synapse tables in my database for all the new loads that I'm doing. To my knowledge, nothing in the code has changed from before.
Eg: If my incoming date is "2022-02-19 00:00:00" it's being written as 2022-02-18 22:00:00.000 instead of 2022-02-19 00:00:00.000. The hours part in the date is also not stable; sometimes it writes as 22:00:00.000 and sometimes 23:00:00.000
I debugged the code but the output of the variable looks totally fine. It just shows the value as 2022-02-19 00:00:00 as expected but the moment the data is getting ingested into the Synapse table, it goes back a couple of hours.
I'm not understanding why this might be happening or what to look for during debugging.
Did any of you face something like this before? Any ideas on how to I can approach this to find out what's causing this erroneous date?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
