'How can i join these dataframes on closes timestamp?

I have two dataframes:

a = spark.createDataFrame(
data=[
    (1, datetime.strptime('2022-05-16', '%Y-%m-%d')),
    (1, datetime.strptime('2022-05-15', '%Y-%m-%d')),
    (1, datetime.strptime('2022-05-14', '%Y-%m-%d')),
    (1, datetime.strptime('2022-05-14', '%Y-%m-%d')),
    (1, datetime.strptime('2022-05-05', '%Y-%m-%d')),

],
schema=StructType(
[
    StructField('seller_id', StringType()),
    StructField('completed_at', DateType())
],
))

b = spark.createDataFrame(
data=[
    (1, datetime.strptime('2022-05-16', '%Y-%m-%d'), 70),
    (1, datetime.strptime('2022-05-15', '%Y-%m-%d'), 71),
    (1, datetime.strptime('2022-05-14', '%Y-%m-%d'), 70),
    (1, datetime.strptime('2022-05-03', '%Y-%m-%d'), 65),

],
schema=StructType(
[
    StructField('user_id', StringType()),
    StructField('event_timestamp', DateType()),
    StructField('lat', IntegerType())
],
))

I want to join these on user_id == seller_id and (completed_at == event_timestamp | closest event_timestamp).

What is the best way to do this?

Desired output should look like this:

seller_id completed_at lat
1 2022-05-05 65
1 2022-05-14 70
1 2022-05-14 70
1 2022-05-15 71
1 2022-05-16 70


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source