'How to get number of null for each row?

How can I get count null in the row? This is simple sample. In my actual df, there are hundreds columns. For example:

val df = Seq(("Alice", "20", "Female"), ("Bob", "30", "Male"), ("Ben", null, "Male"), ("Jessica", null, null)).toDF("name","age","sex")

enter image description here

I want to get df like this. How can I count how many null for each row in df?

val df2 = Seq(("Alice", "20", "Female", "0"), ("Bob", "30", "Male", "0"), ("Ben", null, "Male", "1"), ("Jessica", null, null, "2")).toDF("name","age","sex","CountOfNull")

enter image description here



Solution 1:[1]

There are many ways to do it. Here is one:

val df = Seq(("Alice", "20", "Female"), 
   ("Bob", "30", "Male"), ("Ben", null, "Male"),
   ("Jessica", null, null)).toDF("name","age","sex")

scala> df.show()
+-------+----+------+
|   name| age|   sex|
+-------+----+------+
|  Alice|  20|Female|
|    Bob|  30|  Male|
|    Ben|null|  Male|
|Jessica|null|  null|
+-------+----+------+

df.registerTempTable("test")

scala> sql("SELECT name, age, sex, nvl2(age, 0, 1) + nvl2(sex, 0, 1) as CountOfNull FROM test").show()
+-------+----+------+-----------+
|   name| age|   sex|CountOfNull|
+-------+----+------+-----------+
|  Alice|  20|Female|          0|
|    Bob|  30|  Male|          0|
|    Ben|null|  Male|          1|
|Jessica|null|  null|          2|
+-------+----+------+-----------+

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 y?s??la