'Get the first non-null value from selected cells in a row
Good afternoon, friends!
I'm currently performing some calculations in R (df is displayed below). My goal is to display in a new column the first non-null value from selected cells for each row.
My df is:
MD <- c(100, 200, 300, 400, 500)
liv <- c(0, 0, 1, 3, 4)
liv2 <- c(6, 2, 0, 4, 5)
liv3 <- c(1, 1, 1, 1, 1)
liv4 <- c(1, 0, 0, 3, 5)
liv5 <- c(0, 2, 7, 9, 10)
df <- data.frame(MD, liv, liv2, liv3, liv4, liv5)
I want to display (in a column called "liv6") the first non-null value from 5 cells (given the data, liv1 = 0, liv2 = 6 , liv3 = 1, liv 4 = 1 and liv5 = 1). The result should be 6. And this calculation should be repeated fro each row in my dataframe..
I do know how to do this in Python, but not in R..
Any help is highly appreciated!
Solution 1:[1]
A simple base R option is to apply across relevant columns (I exclude MD here, you can use any data frame subsetting style you want), then just take the first value of the non-zero values of that row.
df$liv6 <- apply(df[-1], 1, \(x) head(x[x > 0], 1))
df
#> MD liv liv2 liv3 liv4 liv5 liv6
#> 1 100 0 6 1 1 0 6
#> 2 200 0 2 1 0 2 2
#> 3 300 1 0 1 0 7 1
#> 4 400 3 4 1 3 9 3
#> 5 500 4 5 1 5 10 4
Solution 2:[2]
A Base R solution:
df$liv6 <- apply(df[-1], 1, function(x) x[min(which(x != 0))])
output
df
MD liv liv2 liv3 liv4 liv5 liv6
1 100 0 6 1 1 0 2
2 200 0 2 1 0 2 2
3 300 1 0 1 0 7 1
4 400 3 4 1 3 9 1
5 500 4 5 1 5 10 1
Solution 3:[3]
One approach is to use purrr::detect to detect the first non-zero element of each row.
We define a function which takes a numeric vector (row) and returns a boolean indicating whether each element is non-zero:
is_nonzero <- function(x) x != 0
We use this function to detect the first non-zero element in each row via purrr:detect
first_nonzero <- apply(df %>% dplyr::select(liv:liv5), 1, function(x) {
purrr::detect(x, is_nonzero, .dir = "forward")
})
We finally create the new column:
df$liv6 <- first_nonzero
As a result, we have
> df
MD liv liv2 liv3 liv4 liv5 liv6
100 0 6 1 1 0 6
200 0 2 1 0 2 2
300 1 0 1 0 7 1
400 3 4 1 3 9 3
500 4 5 1 5 10 4
Solution 4:[4]
Another straightforward solution is:
Reduce(function(x, y) ifelse(!x, y, x), df[, -1])
#[1] 6 2 1 3 4
This way should be very efficient, since we "scan" by column, as, presumably, the data have much fewer columns than rows.
The Reduce approach is a more functional form of a simple, old-school, loop:
ans = df[, 2]
for(j in 3:ncol(df)) {
i = !ans
ans[i] = df[i, j]
}
ans
#[1] 6 2 1 3 4
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | caldwellst |
| Solution 2 | Maël |
| Solution 3 | gpgdx |
| Solution 4 | alexis_laz |
