'Using an if/then type statement in PySpark join
I'm converting some old SAS code to PySpark and need to translated the below SAS line into the following PySpark join. Any hints? The below Pyspark code yields "TypeError: condition should be a Column"
I need to make a new variable (VAR1_recode) and it needs to be null if VAR2 is also null. Otherwise it should have the same value as VAR1. VAR1 is numeric and VAR2 a string.
SAS:
DATA NEED;
MERGE HAVE1 HAVE2;
BY ID;
IF VAR2 = '' THEN VAR1 = '';
RUN;
--
PYSPARK:
import pyspark.sql.functions as F
NEED = HAVE1.join(HAVE2, on=['ID'], how="left") \
.withColumn('VAR1_recode',F.when(F.col('VAR2').isNull('')).otherwise(F.col('VAR1'))) \
.select('ID',F.col('VAR1_recode').alias('VAR1'),'VAR2')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
