'how to count field with condition by spark

I have a dataframe, there is a enum field(value are 0 or 1) named A, another one field B, I would like to implement below scenario:

if `B` is null:
   count(when `A` is 0) and set a column name `xx` 
   count(when `A` is 1) and set a column name `yy` 
if `B` is not null:
   count(when `A` is 0) and set a column name `zz` 
   count(when `A` is 1) and set a column name `mm` 

how can I do it by spark scala?



Solution 1:[1]

It's possible to conditionally populate columns in this way, however the final output DataFrame requires an expected schema.

Assuming all of the scenarios you detailed are possible in one DataFrame, I would suggest creating each of the four columns: "xx", "yy", "zz" and "mm" and conditionally populating them.

In the below example I've populated the values with either "found" or "", primarily to make it easy to see where the values are populated. Using true and false here, or another enum, would likely make more sense in the real world.

Starting with a DataFrame (since you didn't specify the type that "B" is I have gone for a Option[String] (nullable) for this example:

val df = List(
    (0, None),
    (1, None),
    (0, Some("hello")),
    (1, Some("world"))
).toDF("A", "B")
     
 df.show(false)

gives:

+---+-----+
|A  |B    |
+---+-----+
|0  |null |
|1  |null |
|0  |hello|
|1  |world|
+---+-----+

and to create the columns:

df
    .withColumn("xx", when(col("B").isNull && col("A") === 0, "found").otherwise(""))
    .withColumn("yy", when(col("B").isNull && col("A") === 1, "found").otherwise(""))
    .withColumn("zz", when(col("B").isNotNull && col("A") === 0, "found").otherwise(""))
    .withColumn("mm", when(col("B").isNotNull && col("A") === 1, "found").otherwise(""))
    .show(false)

gives:

+---+-----+-----+-----+-----+-----+
|A  |B    |xx   |yy   |zz   |mm   |
+---+-----+-----+-----+-----+-----+
|0  |null |found|     |     |     |
|1  |null |     |found|     |     |
|0  |hello|     |     |found|     |
|1  |world|     |     |     |found|
+---+-----+-----+-----+-----+-----+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 tjheslin1