'Count the number of non-NA numeric values of each row in dplyr
I create a dataframe df.
df <- data.frame (id = 1:10,
var1 = 10:19,
var2 = sample(c(1:2,NA), 10, replace=T),
var3 = sample(c(3:5, NA), 10, replace=T))
What I need is a new column var4, which count the number of non-NA values of each row (excluding the id column). So for example, if a row is like var1=19, var2=1, var3=NA, then var4=2. I could not find a good way to do this in dplyr. something like:
df %in% mutate(var4= ... )
I appreciate if anyone can help me with that.
Solution 1:[1]
Use select + is.na + rowSums, select(., -id) returns the original data frame (.) with id excluded, and then count number of non-NA values with rowSums(!is.na(...)):
df %>% mutate(var4 = rowSums(!is.na(select(., -id))))
# id var1 var2 var3 var4
#1 1 10 NA 4 2
#2 2 11 1 NA 2
#3 3 12 2 5 3
#4 4 13 2 NA 2
#5 5 14 1 NA 2
#6 6 15 1 NA 2
#7 7 16 1 5 3
#8 8 17 NA 4 2
#9 9 18 NA 4 2
#10 10 19 NA NA 1
Solution 2:[2]
I know the OP asked for a dplyr solution, but base R is straightforward here:
df$var4 <- rowSums(!is.na(df[,2:4]))
rowSums calculates the number of values that are not NA (!is.na) in columns 2 - 4.
Note, this is summing the logical vector generated by is.na, which is distinct from:
rowSums(df[,2:4], na.rm = TRUE)
Which drops the NAs and then sums the remaining values.
Solution 3:[3]
Another solution using only base-r
data.frame(df, var4 = apply(df[,-1], 1, function(x) sum(!is.na(x))))
id var1 var2 var3 var4
1 1 10 1 5 3
2 2 11 2 5 3
3 3 12 2 5 3
4 4 13 NA 3 2
5 5 14 NA 5 2
6 6 15 1 5 3
7 7 16 NA 3 2
8 8 17 NA 4 2
9 9 18 NA 3 2
10 10 19 1 4 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | user438383 |
