'count number of missing values using spark_apply
I have the next data frame called df
ci ing de
21 20 100
22 19 0
23 NA 80
24 100 NA
25 NA 50
26 50 30
and I want to count the number of missings of each column using spark.
I know that in R a code like this would work
apply(df, 2,
FUN = function (x)
{ sum(is.na(x)) } )
I want to do the same but using spark
Spark has a function called spark_apply, but I can't figure it out how to make it work.
Solution 1:[1]
Here "na" checking in df...
scala> nacount=df.na.count()
scala>nacount
2000
Solution 2:[2]
Not perfect but works for your purpose using spark_apply:
## count missing values by each column and group by category
ci = c(21:26)
ing = c(20,19,NA,100,NA,50)
de = c(100,0,80,NA,50,30)
df = as.data.frame(list(ci=ci, ing=ing, de=de))
sdf = copy_to(sc, df)
count_na_col_i = function(i, sdf) {
cns = colnames(sdf)
cnt = spark_apply(sdf %>% select(cns[1], cns[i]) %>% mutate(x = cns[i]) %>% rename(y = cns[i]), #preparing data for spark_apply and renames as necessary
f = function(tbl){
require(dplyr)
cn = as.character(collect(tbl %>% select("x") %>% distinct()))
tbl %>% filter(is.na(y)) %>% count()
}, columns = cns[i], group_by = cns[1])
collect(cnt)
}
#i-th column only
i = 2
nna = count_na_col_i(2, sdf)
#all columns
lapply(seq(2,length(colnames(sdf))), function(i, sdf) { count_na_col_i(i, sdf) }, sdf)
Solution 3:[3]
Using @Charlie's sdf object:
sdf %>% spark_apply(function(e) apply(e, 2, function(x) sum(is.na(x))))
will do the job.
The result is a df with one col containing the number of NAs of each column of sdf in one row. If needed, you can transpose it (... %>% as.data.frame() %>% t()) and add the colnames manually.
# Source: table<sparklyr_tmp_3f7f4665748e> [?? x 1]
# Database: spark_connection
ci
<int>
1 0
2 2
3 1
Solution 4:[4]
spark_apply(
df,
(function(e) sum(is.na(e)),
names = c("your","column","names")
)
Try the above
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | R Palanivel-Tamilnadu India |
| Solution 2 | Charlie |
| Solution 3 | nachti |
| Solution 4 | TylerH |
