'read array of filter using when condition pyspark
I want to filter a pyspark data frame using a hive table. I get a list of conditions that I must apply to my data frame from hive table. My pyspark problem doesn't understand the condition I pass at each loop turn.
My list of condition :
filtre=[(F.col('Accountable 2')!= 'xxx') | (F.col('Accountable 2')!= 'yyy') | (F.col('level 2') == 'AAYYA') | (F.col('level 2')!= 'SSS') & (F.col('Type')== 'Paint') & (F.col('Cleaning') != 'TTT') & (F.col('Cleaning')!= 'CCC'),
(F.col('Accountable 2')!= 'YYY') | (F.col('Accountable 2')!= 'yyy') | (F.col('level 2') == 'AAYYA') | (F.col('level 2')!= 'SSS') & (F.col('Type')== 'Paint') & (F.col('Cleaning') != 'TTT') & (F.col('Cleaning')!= 'CCC'),
(F.col('Accountable 2')!= 'ZZZ') | (F.col('Accountable 2')!= 'yyy') | (F.col('level 2') == 'AAYYA') | (F.col('level 2')!= 'SSS') & (F.col('Type')== 'Paint') & (F.col('Cleaning') != 'TTT') & (F.col('Cleaning')!= 'BBB'),
(F.col('Cleaning')!='TTT') & (F.col('Cleaning')!='XXX') & (F.col('CND')=='CND'),
(F.col('Operator level')!= 'AAA') & (F.col('level 3')!= 'EEE')]
My code that i use to fitre my data frame:
i = 0
while i < len(filtre):
if filtre[i]!="":
df=df.withColumn("status", F.when(filtre[i],F.lit("ok")).when(F.col("status")=='ok',F.lit("ok")).otherwise(F.lit("ko")))
i = i + 1
else:
df= df.withColumn("status", F.when(F.col("status")=='non',F.lit("ok")).when(F.col("status")=='ok',F.lit("ok")).otherwise(F.lit("ko")))
i = i + 1
When I try to run this code I have this error message. I don't understand why? if I run each condition on its own works very well.
python/pyspark/sql/functions.py", line 1414, in when
raise TypeError("condition should be a Column")
TypeError: condition should be a Column
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
