'Pyspark "due to data type mismatch: differing types" error in Boolean column creation

I am creating boolean columns and filtering if anyone is false in the downstream.

I created the below boolean column in my Pyspark code and it working. I am

df = spark.read.parquet(data_url)
df = df\
    .withColumn('d1d3_filter', (df.d1_submit_date.isNotNull() & 
                                (df.d1_review_date.isNull())))\
    .withColumn('d41_filter', (df.d4_submit_date.isNotNull() & 
                               (df.d4_review_date.isNull())))\
    .withColumn('d42_filter', (df.d42_submit_date.isNotNull() & 
                               (df.d42_review_date.isNull())))\
    .withColumn('d45_filter', (df.d5_submit_date.isNotNull() & 
                               (df.d5_review_date.isNull())))\
    .withColumn('d6_filter', (df.d8_submit_date.isNotNull() & 
                              (df.d8_review_date.isNull())))

But I need to add another condition, it's throwing an error "due to data type mismatch: differing types in d1_review_date IS NULL) OR d1_status)' (boolean and string)"

df = spark.read.parquet(data_url)
df = df\
    .withColumn('d1d3_filter', (df.d1_submit_date.isNotNull() & 
                                (df.d1_review_date.isNull() |
                                 df.d1_status != 'Approved')))\
    .withColumn('d41_filter', (df.d4_submit_date.isNotNull() & 
                               (df.d4_review_date.isNull() |
                                 df.d4_status != 'Approved')))\
    .withColumn('d42_filter', (df.d42_submit_date.isNotNull() & 
                               (df.d42_review_date.isNull() |
                                 df.d42_status != 'Approved')))\
    .withColumn('d45_filter', (df.d5_submit_date.isNotNull() & 
                               (df.d5_review_date.isNull() |
                                 df.d5_status != 'Approved')))\
    .withColumn('d6_filter', (df.d8_submit_date.isNotNull() & 
                              (df.d8_review_date.isNull() |
                                 df.d8_status != 'Approved')))

All the columns might have null values.

Why new WithColumn expression is not working? What I am missing here?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source