'pyspark get the row immediately after the one selected

i have a dataframe (df) like this:

col1 col2 col3
One Two x
One Two full
One Two y
One Two z
One Two full
One Two u
One Two e

Using PySPark i want to mark the element/rows immediately after col3=="full" with 1 otherwise 0, like this:

col1 col2 col3 flag
One Two x 0
One Two full 0
One Two y 1
One Two z 0
One Two full 0
One Two u 1
One Two e 0

At the moment this is my idea, but i'm not taking the row immediately after...:

df.withColumn('flag',f.when(f.col('CD_OPERAZIONE')=='full',1).otherwise(0))

can you help me?



Solution 1:[1]

Use lag and when statement

w= Window.partitionBy('col1','col2').orderBy('col1')
df.withColumn('x', when(lag('col3').over(w)=='full',1).otherwise(0)).show()

+----+----+----+---+
|col1|col2|col3|  x|
+----+----+----+---+
| One| Two|   x|  0|
| One| Two|full|  0|
| One| Two|   y|  1|
| One| Two|   z|  0|
| One| Two|full|  0|
| One| Two|   u|  1|
| One| Two|   e|  0|
+----+----+----+---+

Solution 2:[2]

Step 1: assign row number to each row using row_number function Step 2: filter the dataframe with col3==full, now you have the row numbers where col3 is full, call it dataframe2 lets say step 3: create a new column adding one to the row number column in dataframe2, now you will have row numbers of the immediate row next to the ones having col3 as full step 4: join dataframe one with dataframe2 via inner join after selecting the new column from dataframe2 on row_number from dataframe1 and new row number column on dataframe2.

Pardon for no code, on my mobile. Let me know if you want help still.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 wwnde
Solution 2 Akash