'How do I create a variable that tells me which of a number of other variables is the first one to not have a missing value for one observation?
If I have the following data structure in my data frame df:
a b c d
1 2 3 4
NA NA 1 2
NA 1 2 NA
NA NA NA 1
how can I create a variable that tells me, which of the variables is the first one to not have a missing value, such that:
a b c d var
1 2 3 4 a
NA NA 1 2 c
NA 1 2 NA b
NA NA NA 1 d
I need the code to work with variable names and not column numbers, because of the size of the dataset and changing the order of the variables.
I have tried:
df <- df %>% mutate(var = coalesce(deparse(substitute(a)), deparse(substitute(b)), deparse(substitute(c)), deparse(substitute(d))))
and
df <- df %>% mutate(var = deparse(substitute(do.call(coalesce, across(c(a, b, c, d))))))
trying to implement this approach. I got the code to extract the string of a variable name from: How to convert variable (object) name into String
Solution 1:[1]
You can do
df %>% mutate(var = apply(., 1, \(x) names(which(!is.na(x)))[1]))
#> a b c d var
#> 1 1 2 3 4 a
#> 2 NA NA 1 2 c
#> 3 NA 1 2 NA b
#> 4 NA NA NA 1 d
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Allan Cameron |
