'Check for the presence of ID in n+1 timepoint and create a new presence/absence variable
I'm sorry if I fail to describe this properly.
Basically I have 3 columns -- Pair, Set1 and Set2.
What I want to know is does Pair appear in either Set1 or Set2 when Gen is equal to Time+1.
So I can then have a 5th variable which is essentially 1 or 0 (yes it's present in the next timepoint, no it is not).
For example, in this example dataset Pair F3 (in Time 1) is not present under either Set when Time = 2.
The only way I can think about doing this is to filter by Time and see if it appears in another time:
time2 <- df %>%
filter(Time == 2)
df %>%
filter(Time == 1) %>%
mutate(Present = ifelse(Pair %in% time2$Set1 | Pair %in% time2$Set2, 1, 0))
This works but I'd like to have a more efficient way of looping through each Time point, checking the subsequent time point and then moving forward ...
I don't want to do a for loop within a for loop -- but maybe this is necessary?
data
structure(list(Time = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L), Pair = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), .Label = c("F1", "F2", "F3",
"F4", "F5", "N1", "N2", "N3", "N4", "N5"), class = "factor"),
Set1 = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 1L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 11L), .Label = c("F1", "F2", "F3", "F5",
"F7", "F9", "N1", "N3", "N5", "N7", "N9"), class = "factor"),
Set2 = structure(c(2L, 3L, 4L, 5L, 1L, 7L, 8L, 9L, 10L, 6L,
2L, 3L, 4L, 5L, 1L, 7L, 8L, 9L, 10L, 6L, 2L, 3L, 4L, 5L,
1L, 7L, 8L, 9L, 10L, 6L), .Label = c("F10", "F2", "F4", "F6",
"F8", "N10", "N2", "N4", "N6", "N8"), class = "factor"),
Present = c(1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA)), class = "data.frame", row.names = c(NA, -30L
))
Solution 1:[1]
If odf
is your original data structure, then you can do the following:
- Melt the
odf
long, adding a new variable which isTime-1
, putting theSet1
andSet2
values into a single column, calledSetValue
library(data.table)
setDT(odf)
df <- melt(odf[,Present:=NULL], id=c("Time", "Pair"),value.name = "SetValue" )[,Time2:=Time-1][]
- Join, this long df on itself, where
Pair=Setvalue
andTime=Time2
, dropis.na()
rows, select columns to retain, and ensure uniqueness, then merge back to your original, assigning back to original frame
odf = merge(
odf,
unique(df[df[,.(SetValue,Time, Time2)], on=.(Pair=SetValue,Time=Time2)][!is.na(Time2), .(Time, Pair,exists=1)]),
by=c("Time","Pair"), all.x = T
)
- If you like, you can update the NA in your new
exists
column to zero, if this is not the last timepoint
odf[Time<max(Time)&is.na(exists), exists:=0][]
Output:
Time Pair Set1 Set2 exists
1: 1 F1 F1 F2 1
2: 1 F2 F3 F4 1
3: 1 F3 F5 F6 0
4: 1 F4 F7 F8 1
5: 1 F5 F9 F10 1
6: 1 N1 N1 N2 1
7: 1 N2 N3 N4 1
8: 1 N3 N5 N6 1
9: 1 N4 N7 N8 1
10: 1 N5 N9 N10 1
11: 2 F1 F1 F2 1
12: 2 F2 F2 F4 1
13: 2 F3 F5 F6 1
14: 2 F4 F7 F8 1
15: 2 F5 F9 F10 1
16: 2 N1 N1 N2 1
17: 2 N2 N3 N4 1
18: 2 N3 N5 N6 1
19: 2 N4 N7 N8 1
20: 2 N5 N9 N10 1
21: 3 F1 F1 F2 NA
22: 3 F2 F3 F4 NA
23: 3 F3 F5 F6 NA
24: 3 F4 F7 F8 NA
25: 3 F5 F9 F10 NA
26: 3 N1 N1 N2 NA
27: 3 N2 N3 N4 NA
28: 3 N3 N5 N6 NA
29: 3 N4 N7 N8 NA
30: 3 N5 N9 N10 NA
Updated - Tidyverse option
df <- odf %>%
select(!Present) %>%
pivot_longer(cols = starts_with("Set"),values_to = "SetValue") %>%
mutate(Time2= Time-1)
left_join(
odf %>% select(!Present),
inner_join(df, df, by=c("Pair"="SetValue", "Time"="Time2")) %>%
filter(!is.na(Time2)) %>%
select(Time, Pair) %>%
mutate(exists=1) %>%
distinct(),
by=c("Time", "Pair")
) %>%
mutate(exists=if_else(is.na(exists) & Time<max(Time), 0, exists))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |