'How to mutate NA values with ifelse statement based on presence in next row (multiple conditions) in r
I know there are several questions similar to this one, but a lot ask for mutiple conditions with the "|" logical operator. I am wondering if it is possible to use multiple "&" operators.
Say I have a dataframe:
| ID | Age | Survival |
|---|---|---|
| 1 | 0 | NA |
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 1 |
| 1 | 4 | NA |
| 2 | 0 | NA |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 2 | 3 | 1 |
| 2 | 4 | 2 |
| 3 | 1 | NA |
| 3 | 2 | 1 |
| 3 | 3 | 3 |
| 4 | 0 | 1 |
There are hundreds of individuals that we have data for yearly,if they were seen the next year we put a 1 value to indicate they survived, 2 to say they died by natural causes, 3 by humans, or NA if we do not know if they are alive or not.
I had to merge all ages 0 or 1 into the df, but none of them had their survival recorded. What I am trying to do it change all the age 0s and 1s to indicate they have survived if the following column is still their ID (knowing that if they are seen later they HAVE to be alive, therefore, Survival = 1). Not every year do we see the individuals so there are times where there will be a gap between age 0 to age 2 or 3, or that we saw them twice at age 0 (don't want to get in why this is important).
What I have tried:
df <- df%>%
group_by(ID) %>%
mutate(Survival = ifelse(is.na(Survival) & Age == 0 & lead(Age) == 0, 1, Survival)
Survival = ifelse(is.na(Survival) & Age == 0 & lead(Age) == 1, 1, Survival),
Survival = ifelse(is.na(Survival) & Age == 0 & lead(Age) == 2, 1, Survival))
For the most part this is working, but I have some individuals that are not getting picked up, and I cannot understand why. Of the 1200 individuals we have spotted, I have about 7 that are still coming up as NA, despite the following row indicating one of these ages required. I used str() beforehand to ensure that in my columns from the merge matched up.
So I am starting to wonder if it is possible to have multiple & conditions in one statement, or if someone can suggest an alternative loop (this one is chunky, but it's all I know)?
Thanks in advance, and happy to answer questions if any more information is required.
UPDATE
I ended up having to use several lines of ifelse loops, which looks pretty bad. Does anyone know a way to shorten this?
mutate( Survival = ifelse(is.na(Survival) & Age == 1 & lead(Age) == 2, 1, Survival),
Survival = ifelse(is.na(Survival) & Age == 1 & lead(Age) == 3, 1, Survival),
Survival = ifelse(is.na(Survival) & Age == 0 & lead(Age) == 3, 1, Survival),
Survival = ifelse(is.na(Survival) & Age == 0 & lead(Age) == 2, 1, Survival),
Survival = ifelse(is.na(Survival) & Age == 0 & lead(Age) == 1, 1, Survival),
Survival = ifelse(is.na(Survival) & Age == 0 & lead(Age) == 0, lead(Survival), Survival))
Solution 1:[1]
I am not sure if the following code solves the problem. It checks the first element of each group for missing Survival, and if the group has more rows, assigns a 1 to Survival.
df <- read.table(text = "
ID Age Survival
1 0 NA
1 1 1
1 2 1
1 3 1
1 4 NA
2 0 NA
2 1 1
2 2 1
2 3 1
2 4 2
3 1 NA
3 2 1
3 3 3
4 0 1", header = TRUE)
library(dplyr)
df %>%
group_by(ID) %>%
mutate(
flag = is.na(Survival) & row_number() == 1L & n() > 1L,
Survival = ifelse(flag, 1, Survival)
) %>%
select(-flag)
#> # A tibble: 14 x 3
#> # Groups: ID [4]
#> ID Age Survival
#> <int> <int> <dbl>
#> 1 1 0 1
#> 2 1 1 1
#> 3 1 2 1
#> 4 1 3 1
#> 5 1 4 NA
#> 6 2 0 1
#> 7 2 1 1
#> 8 2 2 1
#> 9 2 3 1
#> 10 2 4 2
#> 11 3 1 1
#> 12 3 2 1
#> 13 3 3 3
#> 14 4 0 1
Created on 2022-02-20 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rui Barradas |
