'Subset of rows containing NA (missing) values in a chosen column of a data frame
We have a data frame from a CSV file. The data frame DF has columns that contain observed values and a column (VaR2) that contains the date at which a measurement has been taken. If the date was not recorded, the CSV file contains the value NA, for missing data.
Var1 Var2
10 2010/01/01
20 NA
30 2010/03/01
We would like to use the subset command to define a new data frame new_DF such that it only contains rows that have an NA' value from the column (VaR2). In the example given, only Row 2 will be contained in the new DF.
The command
new_DF<-subset(DF, DF$Var2=="NA")
does not work, the resulting data frame has no row entries.
If in the original CSV file the Value NA are exchanged with NULL, the same command produces the desired result:
new_DF <- subset(DF, DF$Var2=="NULL")
How can I get this method working, if for the character string the value NA is provided in the original CSV file?
Solution 1:[1]
Never use =='NA' to test for missing values. Use is.na() instead. This should do it:
new_DF <- DF[rowSums(is.na(DF)) > 0,]
or in case you want to check a particular column, you can also use
new_DF <- DF[is.na(DF$Var),]
In case you have NA character values, first run
Df[Df=='NA'] <- NA
to replace them with missing values.
Solution 2:[2]
complete.cases gives TRUE when all values in a row are not NA
DF[!complete.cases(DF), ]
Solution 3:[3]
NA is a special value in R, do not mix up the NA value with the "NA" string. Depending on the way the data was imported, your "NA" and "NULL" cells may be of various type (the default behavior is to convert "NA" strings to NA values, and let "NULL" strings as is).
If using read.table() or read.csv(), you should consider the "na.strings" argument to do clean data import, and always work with real R NA values.
An example, working in both cases "NULL" and "NA" cells :
DF <- read.csv("file.csv", na.strings=c("NA", "NULL"))
new_DF <- subset(DF, is.na(DF$Var2))
Solution 4:[4]
new_data <- data %>% filter_all(any_vars(is.na(.)))
This should create a new data frame (new_data) with only the missing values in it.
Works best to keep a track of values that you might later drop because they had some columns with missing observations (NA).
Solution 5:[5]
Try changing this:
new_DF<-dplyr::filter(DF,is.na(Var2))
Solution 6:[6]
Since dplyrs filter_all has been superseded
Scoped verbs (_if, _at, _all) have been superseded by the use of
across()in an existing verb.
and the usage of across() in filter() is deprecated, Ronak Pol's answer needs a small update. To find all rows with an NA anywhere, we could use
library(dplyr)
DF %>%
filter(if_any(everything(), is.na))
to get
# A tibble: 1 x 2
Var1 Var2
<dbl> <date>
1 20 NA
Solution 7:[7]
Prints all the rows with NA data:
tmp <- data.frame(c(1,2,3),c(4,NA,5));
tmp[round(which(is.na(tmp))/ncol(tmp)),]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Jelena?uklina |
| Solution 2 | user3226167 |
| Solution 3 | maressyl |
| Solution 4 | Dmitriy |
| Solution 5 | csilk |
| Solution 6 | Martin Gal |
| Solution 7 | Manfred Radlwimmer |
