'How to group a pyspark dataframe and use a shift operator as aggregation method?
I have the following dataframe :
| ride | window_left | window_right | time |
|---|---|---|---|
| 1 | No | No | 1 |
| 1 | No | Yes | 2 |
| 1 | Yes | Yes | 3 |
| 1 | Yes | Yes | 4 |
| 2 | No | No | 1 |
| 2 | Yes | No | 2 |
| 2 | Yes | Yes | 3 |
| 2 | Yes | Yes | 4 |
| 2 | Yes | Yes | 5 |
And I would like to group this pyspark dataframe by the column ride and shift either window_left or window_right according to which one takes the value Yes first, like that :
| ride | window_left | window_right | time |
|---|---|---|---|
| 1 | No | No | 1 |
| 1 | Yes | Yes | 2 |
| 1 | Yes | Yes | 3 |
| 1 | None | Yes | 4 |
| 2 | No | No | 1 |
| 2 | Yes | Yes | 2 |
| 2 | Yes | Yes | 3 |
| 2 | Yes | Yes | 4 |
| 2 | Yes | None | 5 |
I would like to use pyspark only to perform this transformation. I cannot use pandas, and that's why I have some difficulties. The tricky part for me is to shift after grouping, knowing that the shift depends on the group.
Any help would be appreciated, thanks !
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
