'Spark dataframe filter by dates range
I have a simple spark dataframe where one of the columns is time date, I want a function that, given the dataframe name, column datetime name, start date and end date, it returns a new dataframe that only includes values inside that range.
def range_of_dates(df, column_datetime, start_time, end_time):
return (new_df)
Solution 1:[1]
Suppose start_time and end_time are of type str, as follows.
def range_of_dates(df, column_datetime, start_time, end_time):
return df.filter(f'{column_datetime} between "{start_time}" and "{end_time}"')
data = [
('A', 1, '2022-01-01'),
('B', 2, '2022-01-02'),
('C', 3, '2022-01-03'),
('C', 3, '2022-01-04')
]
df = spark.createDataFrame(data, ['col1', 'col2', 'dt'])
new_df = range_of_dates(df, 'dt', '2022-01-02', '2022-01-03')
new_df.show(truncate=False)
Solution 2:[2]
You can easily do this with filter -
Data Preparation
df = pd.DataFrame(pd.date_range(start="2022-02-21", end="2022-05-10"),columns=['timestamp'])
sparkDF = sql.createDataFrame(df)
sparkDF.show()
+-------------------+
| timestamp|
+-------------------+
|2022-02-21 00:00:00|
|2022-02-22 00:00:00|
|2022-02-23 00:00:00|
|2022-02-24 00:00:00|
|2022-02-25 00:00:00|
|2022-02-26 00:00:00|
|2022-02-27 00:00:00|
|2022-02-28 00:00:00|
|2022-03-01 00:00:00|
|2022-03-02 00:00:00|
|2022-03-03 00:00:00|
|2022-03-04 00:00:00|
|2022-03-05 00:00:00|
|2022-03-06 00:00:00|
|2022-03-07 00:00:00|
|2022-03-08 00:00:00|
|2022-03-09 00:00:00|
|2022-03-10 00:00:00|
|2022-03-11 00:00:00|
|2022-03-12 00:00:00|
+-------------------+
Filter
def range_of_dates(inp,column_datetime, start_time, end_time):
inp = inp.filter( ( F.col(column_datetime) >= start_time )
& ( F.col(column_datetime) <= end_time )
)
return inp
sparkDF_filtered = range_of_dates(sparkDF,'timestamp','2022-02-22','2022-02-28')
sparkDF_filtered.show()
+-------------------+
| timestamp|
+-------------------+
|2022-02-22 00:00:00|
|2022-02-23 00:00:00|
|2022-02-24 00:00:00|
|2022-02-25 00:00:00|
|2022-02-26 00:00:00|
|2022-02-27 00:00:00|
|2022-02-28 00:00:00|
+-------------------+
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | 过过招 |
| Solution 2 | Vaebhav |
