'How to get the group of different intervals in pyspark?

I have one pyspark Dataframe with different intervals and his equivalent groups. And I need to eval the column of other dataframe and get the group of the interval that follow the data

This are the intervals

``` # +---+-----------------+
# | start | end      | grupo|
# +---+------------------+
# |  0    |    10    |  1  | 
# |  11   |    27    |  2  | 
# |  28   |    33    |  3  | 
# |  34   |    41    |  4  | 
# |  42   |    46    |  5  | 
# +---+--------------------+```

And I have this:

 # +---+
# | result| 
# +---+----
# |  5    | 
# |  7    |
# |  33   | 
# |  22   |  
# |  41   | 
# +---+----

And I need this

``` # +---+-------
# | result| grupo|
# +---+-----------
# |  5    |    1| 
# |  7    |    1| 
# |  33   |    3|  
# |  22   |    2| 
# |  41   |    4| 
# +---+----------


Solution 1:[1]

You can join the df containing intervals and result based such that df["result"].between(df_intervals["start"], df_intervals["end"])

Working Example

df_intervals = spark.createDataFrame([(0, 10, 1, ),
(11, 27, 2, ),
(28, 33, 3, ),
(34, 41, 4, ),
(42, 46, 5, ),], ("start", "end", "group", ),)

df = spark.createDataFrame([(5, ), 
(7, ),
(33, ), 
(22, ),  
(41, ), ], ("result", ))

(df_intervals.join(df, df["result"].between(df_intervals["start"], df_intervals["end"]))
             .select("result", "group")).show()

"""
+------+-----+
|result|group|
+------+-----+
|     5|    1|
|     7|    1|
|    22|    2|
|    33|    3|
|    41|    4|
+------+-----+
"""

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nithish