'How to auto increment for the specific condition in PySpark
I have below PySpark dataframe
id dept
1 CSE
ISE
ECE
2 EEE
4 MCE
I am trying to increment the value by checking the condition i.e if id is null then add the value to the new column by incrementing the value by 1 from the defined value (max_value), if id is not null retain the same value.
to achieve above scenario I am using below code.
max_value = 5
df = df.withColumn("idx", monotonically_increasing_id())
w = Window().orderBy("idx")
df = df.withColumn("row_num", F.when(F.col("id").isNull() ,(max_value + row_number().over(w)).otherwise(F.col("id"))))
But i am getting below error
IllegalArgumentException: otherwise() can only be applied on a Column previously generated by when()
Expected output:
id dept new_col
1 CSE 1
ISE 6
ECE 7
2 EEE 2
4 MCE 4
Can anyone help me to resolve the issue. It will be great
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
