'How to filter variable date In spark?
I have a dataset that contains all the dates between 2010 and 2040 under this format
1/1/2010
1/2/2010
1/3/2010
...
...
...
12/31/2040
I am using Spark to transform data where I'm trying to apply a filter that only keeps dates that are [today - 2 years, open in the future]
I literally tried using all the date manipulation functions Spark offers including
df_calendar.filter(datediff(to_date(col("date"),"m/d/yyyy"),current_date()).gt(-730))
df_calendar.select("*").withColumn("datediff",datediff(to_date(col("date"),"m/d/yyyy"),current_date())).filter(col("datediff")>(-730))
val today = df_calendar.select(date_sub(current_date(),730))
df_calendar.filter((to_date(col("date"),"m/d/yyyy") > today ))
But I always end up with the same result, the dataset return all the values starting 1/1/2021, as it goes back "2 years" but not in terms of days. Notice that I also tried using the year() function and it also returns the same result, I'm seriously confused of the result that returns each time, I really need your help for this one.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
