'Apply a specific filter condition for each group in Spark

I have a data frame as below:

id    |original_date |date1      |date2      |name
1     |03-30-2022    |03-29-2022 |04-02-2022 | John  
1     |03-27-2022    |03-29-2022 |04-02-2022 | Mary
2     |04-01-2022    |03-29-2022 |04-02-2022 | Joe
2     |03-30-2022    |04-02-2022 |04-08-2022 | Susan
3     |04-03-2022    |04-02-2022 |04-08-2022 | Mallory

I am looking to get the following resultant dataframe such that for each group of id, I want to apply a filter condition such that date1 < original_date <= date2.

id    |original_date |date1      |date2      |name
1     |03-30-2022    |03-29-2022 |04-02-2022 | John  
2     |04-01-2022    |03-29-2022 |04-02-2022 | Joe
3     |04-03-2022    |04-02-2022 |04-08-2022 | Mallory

How can I do this?



Solution 1:[1]

Isn't that just a straightforward filter condition, or am I missing something?

from pyspark.sql import functions as F

(df
    .where(
          (F.col('date1') < F.col('original_date'))
        & (F.col('original_date') <= F.col('date2'))
    )
    .show()
)

+---+-------------+----------+----------+-------+
| id|original_date|     date1|     date2|   name|
+---+-------------+----------+----------+-------+
|  1|   03-30-2022|03-29-2022|04-02-2022|   John|
|  2|   04-01-2022|03-29-2022|04-02-2022|    Joe|
|  3|   04-03-2022|04-02-2022|04-08-2022|Mallory|
+---+-------------+----------+----------+-------+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 pltc