'pyspark replace column values with when function gives column object is not callable
I have a table like this
name
----
A
B
ccc
D
eee
and a list of valid names
legal_names = [A, B, D]
And I want to replace all illegal names with another string "INVALID".
I used this script:
(
df.withColumn(
"name",
F.when((F.col("name").isin(legal_names)), F.col("name")).otherwhise(
F.lit("INVALID")
),
)
)
But I get this error
TypeError: 'Column' object is not callable
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File <command-4397929369165676>:4, in <cell line: 2>()
1 (
2 df.withColumn(
3 "name",
----> 4 F.when((F.col("name").isin(legal_names)), F.col("name")).otherwhise(
5 F.lit("INVALID")
6 ),
7 )
8 )
TypeError: 'Column' object is not callable
Dummy data to reproduce:
vals = [("A", ), ("B", ), ("ccc", ), ("D", ), ("EEE", )]
cols = ["name"]
legal_names = ["A", "B", "D"]
df = spark.createDataFrame(vals, cols)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
