'Pyspark dataframe withColumn 'when' not working with '<' or '>'

I'm trying to create a new column in the same dataframe with certain greater than or less than conditions using 'when, like the following:

df = df.withColumn(
    "new_col",
    when(col("age") < 17, 1234) #when(col("DAYS") < 30, lit("ECONOMICAL"))
    .when(col("age") > 17, 5678)
    .otherwise(df.old_col)

However, I am getting this error message: '<' not supported between instances of 'Row' and 'int'

I've tried this when(int(col("age")) < 17, 1234) and it didn't work.

I also tried using '<=' and '>=' and they also didn't work

I even saw another post in here suggesting a user a solution containing when(col("DAYS") < 30, lit("ECONOMICAL")) in a withColumn like mine, but I tried doing the lit in the result too and no good.

Does anyone know why this can't be done? I have no problems doing an == between rows and int, the problem is just when I try '<' and '>'?



Solution 1:[1]

Found the solution - was need to add explicit casting...

df = df.withColumn(
    "new_col",
    when(col("age").cast("int") < 17, 1234)
    .when(col("age").cast("int") > 17, 5678)
    .otherwise(df.old_col)

This solution worked.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Alex Ott