'pyspark replace column values with when function gives column object is not callable

I have a table like this

name
----
A
B
ccc
D
eee

and a list of valid names

legal_names = [A, B, D]

And I want to replace all illegal names with another string "INVALID".

I used this script:

(
    df.withColumn(
        "name",
        F.when((F.col("name").isin(legal_names)), F.col("name")).otherwhise(
            F.lit("INVALID")
        ),
    )
)

But I get this error


TypeError: 'Column' object is not callable
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File <command-4397929369165676>:4, in <cell line: 2>()
      1 (
      2     df.withColumn(
      3         "name",
----> 4         F.when((F.col("name").isin(legal_names)), F.col("name")).otherwhise(
      5             F.lit("INVALID")
      6         ),
      7     )
      8 )

TypeError: 'Column' object is not callable

Dummy data to reproduce:

vals = [("A", ), ("B", ), ("ccc", ), ("D", ), ("EEE", )]
cols = ["name"]
legal_names = ["A", "B", "D"]
df = spark.createDataFrame(vals, cols)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source