'Is there a way to specify the current row in R data.table
The code below creates a minimal data.table, shows the approach I am using with a for loop, and prints the desired output.
library(data.table)
# example data.table where "ID" corresponds to the row index
df <- data.table(ID = 1:5, parent_ID = c(0,1,1,3,3), value = paste0(rep("S", 5),1:5))
# unsort the rows and remove one so that ID no longer corresponds to the row index
df2 <- df[c(1, 4,5,3), .(ID, parent_ID, value)]
# this method below works
for(i in 2:nrow(df2))
{
df2[i, "parent_value"] <- df2[which(df2[,ID] %in% df2$parent_ID[i]), "value"]
}
df2
Output:
ID parent_ID value parent_value
1: 1 0 S1 <NA>
2: 4 3 S4 S3
3: 5 3 S5 S3
4: 3 1 S3 S1
My question is if there is a different way to do this in data.table that avoids for loops. My guess is that it would look like the following, but it seems I need a way to reference the current row, thus the title question.
df2[, parent_value := df2[which(df2[,ID] %in% df2$parent_ID[i]), "value"]]
Any ideas appreciated.
Solution 1:[1]
Using match:
df2[, parent_value := value[match(parent_ID, ID)]]
ID parent_ID value parent_value
1: 1 0 S1 <NA>
2: 4 3 S4 S3
3: 5 3 S5 S3
4: 3 1 S3 S1
Solution 2:[2]
You can try this:
f <- function(b) df[which(df2$ID %in% df2$parent_ID[b$id])]$value
df2[, parent_value:= f(.BY), by=.(id = 1:nrow(df2))]
Output:
ID parent_ID value parent_value
1: 1 0 S1 <NA>
2: 4 3 S4 S4
3: 5 3 S5 S4
4: 3 1 S3 S1
You can use also use .I (as suggested in the comments by Severin), like this (magrittr pipe added for presentation clarity only:
df2[,id:=.I] %>%
.[, parent_value:=df[which(df2$ID %in% df2$parent_ID[id]), value], by=1:nrow(df2)] %>%
.[,id:=NULL] %>%
.[]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
