'How to group a pyspark dataframe and use a shift operator as aggregation method?

I have the following dataframe :

ride window_left window_right time
1 No No 1
1 No Yes 2
1 Yes Yes 3
1 Yes Yes 4
2 No No 1
2 Yes No 2
2 Yes Yes 3
2 Yes Yes 4
2 Yes Yes 5

And I would like to group this pyspark dataframe by the column ride and shift either window_left or window_right according to which one takes the value Yes first, like that :

ride window_left window_right time
1 No No 1
1 Yes Yes 2
1 Yes Yes 3
1 None Yes 4
2 No No 1
2 Yes Yes 2
2 Yes Yes 3
2 Yes Yes 4
2 Yes None 5

I would like to use pyspark only to perform this transformation. I cannot use pandas, and that's why I have some difficulties. The tricky part for me is to shift after grouping, knowing that the shift depends on the group.

Any help would be appreciated, thanks !



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source