'Get the first non-null value from selected cells in a row

Good afternoon, friends!

I'm currently performing some calculations in R (df is displayed below). My goal is to display in a new column the first non-null value from selected cells for each row.

My df is:

MD <- c(100, 200, 300, 400, 500)
liv <- c(0, 0, 1, 3, 4)
liv2 <- c(6, 2, 0, 4, 5)
liv3 <- c(1, 1, 1, 1, 1)
liv4 <- c(1, 0, 0, 3, 5)
liv5 <- c(0, 2, 7, 9, 10)
df <- data.frame(MD, liv, liv2, liv3, liv4, liv5)

I want to display (in a column called "liv6") the first non-null value from 5 cells (given the data, liv1 = 0, liv2 = 6 , liv3 = 1, liv 4 = 1 and liv5 = 1). The result should be 6. And this calculation should be repeated fro each row in my dataframe..

I do know how to do this in Python, but not in R..

Any help is highly appreciated!



Solution 1:[1]

A simple base R option is to apply across relevant columns (I exclude MD here, you can use any data frame subsetting style you want), then just take the first value of the non-zero values of that row.

df$liv6 <- apply(df[-1], 1, \(x) head(x[x > 0], 1))
df
#>    MD liv liv2 liv3 liv4 liv5 liv6
#> 1 100   0    6    1    1    0    6
#> 2 200   0    2    1    0    2    2
#> 3 300   1    0    1    0    7    1
#> 4 400   3    4    1    3    9    3
#> 5 500   4    5    1    5   10    4

Solution 2:[2]

A Base R solution:

df$liv6 <- apply(df[-1], 1, function(x) x[min(which(x != 0))])

output

df
   MD liv liv2 liv3 liv4 liv5 liv6
1 100   0    6    1    1    0    2
2 200   0    2    1    0    2    2
3 300   1    0    1    0    7    1
4 400   3    4    1    3    9    1
5 500   4    5    1    5   10    1

Solution 3:[3]

One approach is to use purrr::detect to detect the first non-zero element of each row.

We define a function which takes a numeric vector (row) and returns a boolean indicating whether each element is non-zero:

is_nonzero <- function(x) x != 0

We use this function to detect the first non-zero element in each row via purrr:detect

first_nonzero <- apply(df %>% dplyr::select(liv:liv5), 1, function(x) {
   purrr::detect(x, is_nonzero, .dir = "forward")  
})

We finally create the new column:

df$liv6 <- first_nonzero

As a result, we have

> df
MD liv liv2 liv3 liv4 liv5 liv6
100   0    6    1    1    0    6
200   0    2    1    0    2    2
300   1    0    1    0    7    1
400   3    4    1    3    9    3
500   4    5    1    5   10    4

Solution 4:[4]

Another straightforward solution is:

Reduce(function(x, y) ifelse(!x, y, x), df[, -1])
#[1] 6 2 1 3 4

This way should be very efficient, since we "scan" by column, as, presumably, the data have much fewer columns than rows.

The Reduce approach is a more functional form of a simple, old-school, loop:

ans = df[, 2]
for(j in 3:ncol(df)) {
  i = !ans
  ans[i] = df[i, j]
}
ans
#[1] 6 2 1 3 4

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 caldwellst
Solution 2 Maël
Solution 3 gpgdx
Solution 4 alexis_laz