'Retain observations whose NA is <= 20% of total variables

Suppose we have this dataframe with six observations and four variables

df <- data.frame(a = c(1, NA, NA, 4, NA, 5),
                 b = c(NA, NA, NA, NA, NA, 1),
                 c = c(1, 2, 3, 4, NA, 6),
                 d = c(6, 7, NA, NA, 4, 4))

a	b	c	d
1	NA	1	6
NA	NA	2	7
NA	NA	3	NA
4	NA	4	NA
NA	NA	NA	4
5	1	6	4

How can we retain observations whose NA's does not exceed 50% of the variables? (In this case each observation left will have two NA's at most; thus only 4 observations will be retained.)

r data-cleaning na missing-data

Solution 1:^[1]

You use rowSums() to count up the NAs in each row. Then you discard the rows with more than threshold*ncol(df) NAs in their row.

threshold <- 0.5

df <- df[-which(rowSums(is.na(df)) > threshold*ncol(df)), ]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Ben Smith

'Retain observations whose NA is <= 20% of total variables

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]