'Count non-null values in each column of a dataframe in R
I am working with the following dataframe:
cell_a cell_b cell_c group
N/A 2.5 5 A
1.2 3.6 N/A A
3 2.1 3.2 A
N/A N/A 1 B
1.2 N/A N/A B
2 N/A N/A B
I would like to count the number of rows with a non-null value in each column and summarise it by group.
The outcome should be stored in new dataframe, such as:
cell_a cell_b cell_c group
2 3 2 A
2 0 1 B
I tried with:
df_2 <- aggregate(df[1:3], list(df$group), length)
but it's indeed giving me the total length of each rows for each group. I've also tried adding na.action = na.omit or na.rm=TRUE after length but doesn't work.
What else can I use in this code to ignore the N/A values?
Thank you very much in advance for your help!
Solution 1:[1]
Here is how we could do it with dplyr:
- change
N\AnotNAwithna_ifandacross - group
- summarise with
across
library(dplyr)
df %>%
mutate(across(starts_with("cell"), ~na_if(., "N/A"))) %>%
group_by(group) %>%
summarise(across(starts_with("cell"), ~sum(!is.na(.))))
group cell_a cell_b cell_c
<chr> <int> <int> <int>
1 A 2 3 2
2 B 2 0 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | TarJae |
