'Spark dataframe filter by dates range

I have a simple spark dataframe where one of the columns is time date, I want a function that, given the dataframe name, column datetime name, start date and end date, it returns a new dataframe that only includes values inside that range.

def range_of_dates(df, column_datetime, start_time, end_time):
   return (new_df)


Solution 1:[1]

Suppose start_time and end_time are of type str, as follows.

def range_of_dates(df, column_datetime, start_time, end_time):
    return df.filter(f'{column_datetime} between "{start_time}" and "{end_time}"')


data = [
    ('A', 1, '2022-01-01'),
    ('B', 2, '2022-01-02'),
    ('C', 3, '2022-01-03'),
    ('C', 3, '2022-01-04')
]
df = spark.createDataFrame(data, ['col1', 'col2', 'dt'])
new_df = range_of_dates(df, 'dt', '2022-01-02', '2022-01-03')
new_df.show(truncate=False)

Solution 2:[2]

You can easily do this with filter -

Data Preparation

df = pd.DataFrame(pd.date_range(start="2022-02-21", end="2022-05-10"),columns=['timestamp'])

sparkDF = sql.createDataFrame(df)

sparkDF.show()

+-------------------+
|          timestamp|
+-------------------+
|2022-02-21 00:00:00|
|2022-02-22 00:00:00|
|2022-02-23 00:00:00|
|2022-02-24 00:00:00|
|2022-02-25 00:00:00|
|2022-02-26 00:00:00|
|2022-02-27 00:00:00|
|2022-02-28 00:00:00|
|2022-03-01 00:00:00|
|2022-03-02 00:00:00|
|2022-03-03 00:00:00|
|2022-03-04 00:00:00|
|2022-03-05 00:00:00|
|2022-03-06 00:00:00|
|2022-03-07 00:00:00|
|2022-03-08 00:00:00|
|2022-03-09 00:00:00|
|2022-03-10 00:00:00|
|2022-03-11 00:00:00|
|2022-03-12 00:00:00|
+-------------------+

Filter

def range_of_dates(inp,column_datetime, start_time, end_time):
    inp = inp.filter( ( F.col(column_datetime) >= start_time )
                    & ( F.col(column_datetime) <= end_time )
                    )
    return inp


sparkDF_filtered = range_of_dates(sparkDF,'timestamp','2022-02-22','2022-02-28')

sparkDF_filtered.show()

+-------------------+
|          timestamp|
+-------------------+
|2022-02-22 00:00:00|
|2022-02-23 00:00:00|
|2022-02-24 00:00:00|
|2022-02-25 00:00:00|
|2022-02-26 00:00:00|
|2022-02-27 00:00:00|
|2022-02-28 00:00:00|
+-------------------+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 过过招
Solution 2 Vaebhav