'How do I create a variable that tells me which of a number of other variables is the first one to not have a missing value for one observation?

If I have the following data structure in my data frame df:

a  b  c  d

1  2  3  4
NA NA 1  2
NA 1  2  NA
NA NA NA 1

how can I create a variable that tells me, which of the variables is the first one to not have a missing value, such that:

a  b  c  d  var

1  2  3  4  a
NA NA 1  2  c
NA 1  2  NA b
NA NA NA 1  d

I need the code to work with variable names and not column numbers, because of the size of the dataset and changing the order of the variables.

I have tried:

df <- df %>% mutate(var = coalesce(deparse(substitute(a)), deparse(substitute(b)), deparse(substitute(c)), deparse(substitute(d))))

and

df <- df %>% mutate(var = deparse(substitute(do.call(coalesce, across(c(a, b, c, d))))))

trying to implement this approach. I got the code to extract the string of a variable name from: How to convert variable (object) name into String



Solution 1:[1]

You can do

df %>% mutate(var = apply(., 1, \(x) names(which(!is.na(x)))[1]))
#>    a  b  c  d var
#> 1  1  2  3  4   a
#> 2 NA NA  1  2   c
#> 3 NA  1  2 NA   b
#> 4 NA NA NA  1   d

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Allan Cameron