'How to drop all NA columns in a SparkDataFrame with SparkR?

Once again, I'm facing a problem that I can't transcribe under SparkR. I have a SparkDataFrame which some columns contain only NAs, and I want to delete all these columns.

I discovered SparkR recently, I think I'm far from understanding all its operation, but it's very frustrating to block on a point yet not so complicated...

Here is the reprex and the way I am doing it in R :

library(data.table)

df <- data.frame(V1 = base::sample(1:10,5), V2 = base::rep(NA,5), V3 = base::sample(1:10,5), V4 = base::rep(NA,5), V5 = base::rep(NA,5), X = runif(n = 5, min = 0, max = 5))
sdf <- createDataFrame(df)
dt <- setDT(df)

na.lst <- sapply(dt, function(x) all(is.na(x)))
dt[, which(na.lst) := NULL]

Thanks !



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source