'How to combine For and When inside a WithColumn clause in Pyspark
I have a question related to Pyspark. I want to make a repeating structure for the logic below. Right after the logic I'll put the "for" that I tried to do.
First, just below is the logic that I need to replace for a repeating structure, notice that within the "when" clause the only thing that repeats is the number that I highlighted in bold and italics "(col("NoInstallments")== 1)".
In fact I need to repeat this structure until "(col("NoInstallments")== 200)".
To make it works, this logic is required to follow a "when" structure followed by "otherwise" and only in the last otherwise it has a different condition, which would be:
"otherwise( when( ((col("Receipt").isNotNull()) | (col("Receipt") != '')), col("Receipt"))"
I made this structure only until 10 to show the example, but I need it up to 200
Structure:
df_items= df_items.withColumn(
'Reference_doc',
when(
( ((col("Receipt").isNull()) | (col("Receipt") == '')) & (col("NoInstallments") == 1) ) , col("ID_ND")
).otherwise(
when(
( ((col("Receipt").isNull()) | (col("Receipt") == '')) & (col("NoInstallments") == 2) ) , col("ID_ND")
).otherwise(
when(
( ((col("Receipt").isNull()) | (col("Receipt") == '')) & (col("NoInstallments") == 3) ) , col("ID_ND")
).otherwise(
when(
( ((col("Receipt").isNull()) | (col("Receipt") == '')) & (col("NoInstallments") == 4) ) , col("ID_ND")
).otherwise(
when(
( ((col("Receipt").isNull()) | (col("Receipt") == '')) & (col("NoInstallments") == 5) ) , col("ID_ND")
).otherwise(
when(
( ((col("Receipt").isNull()) | (col("Receipt") == '')) & (col("NoInstallments") == 6) ) , col("ID_ND")
).otherwise(
when(
( ((col("Receipt").isNull()) | (col("Receipt") == '')) & (col("NoInstallments") == 7) ) , col("ID_ND")
).otherwise(
when(
( ((col("Receipt").isNull()) | (col("Receipt") == '')) & (col("NoInstallments") == 8 ) ) , col("ID_ND")
).otherwise(
when(
( ((col("Receipt").isNull()) | (col("Receipt") == '')) & (col("NoInstallments") == 9) ) , col("ID_ND")
).otherwise(
when(
( ((col("Receipt").isNull()) | (col("Receipt") == '')) & (col("NoInstallments") == 10) ) , col("ID_ND")
).otherwise(
when(
((col("Receipt").isNotNull()) | (col("Receipt") != '')), col("Receipt"))
))))))))))))))))))))
)
"For" i'm trying to do:
first I created this list by making a List Comprehension, I made a result only with 5 to show the example, but I need it up to 200:
condition = ['( ((col("Receipt").isNull()) | (col("Receipt") =='+'""'+')) & (col("NoInstallments") == '+str(n)+ ')' for n in range(1,200)]
the result of this list is:
['( ((col("Receipt").isNull()) | (col("Receipt") =="")) & (col("NoInstallments") == 1))',
'( ((col("Receipt").isNull()) | (col("Receipt") =="")) & (col("NoInstallments") == 2))',
'( ((col("Receipt").isNull()) | (col("Receipt") =="")) & (col("NoInstallments") == 3))',
'( ((col("Receipt").isNull()) | (col("Receipt") =="")) & (col("NoInstallments") == 4))']
My goal is to create a column with all this repeating structure inside, so I did a "withColumn" to create a new column and then I put a "for" in and within the "for" I put the condition using the "when" and "otherwise" (because if I'm not mistaken they are the equivalent of the clauses "IF" and "Else" in Pyspark), but this is not working, another problem is that I want the "for" to run up to the penultimate "otherwise", but do not apply the repeating structure to the last "otherwise".
This is how My attempt was:
df_items = df_items.withColumn(
'Reference_doc',
for cond in condition:
when(cond, col("ID_ND")
).otherwise (when (cond, col("ID_ND"))
).otherwise(
When(
((col("XBLNR").isNotNull()) | (col("XBLNR") != '')), col("XBLNR")))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
