'Dynamic offset based on condition
My data.table looks like below.
library(data.table)
my_dt <- data.table(time = c(1000, 1111, 2000, 3000, 4500, 4600, 5000, 7000, 8500, 9000),
var1 = 1:10,
var2 = c(0.5, 0.6, 2, 0.8, 2.1, 1, 7, 8, 0.1, 0.3))
#printed table:
time var1 var2
1: 1000 1 0.5
2: 1111 2 0.6
3: 2000 3 2.0
4: 3000 4 0.8
5: 4500 5 2.1
6: 4600 6 1.0
7: 5000 7 7.0
8: 7000 8 8.0
9: 8500 9 0.1
10: 9000 10 0.3
For each value in var1 I need to find the row index of the first value in var2 which is equal to/greater then the value in var1. I then need to compute the difference in time by substracting the corresponding values in the time col. For example: The first value in var1 is 1 (also row index 1). The first value in var2 which is equal to/greater then 1 is 2.0 and at row index 3. Hence the corresponding difference in time is 2000 - 1000 = 1000:
my_dt$time[3] - my_dt$time[1]
The desired result looks like below. I actually don't need the idx column, but it might help as an intermediary result.
my_result <- data.table(time = c(1000, 1111, 2000, 3000, 4500, 4600, 5000, 7000, 8500, 9000),
var1 = 1:10,
var2 = c(0.5, 0.6, 2, 0.8, 2.1, 1, 7, 8, 0.1, 0.3),
idx = c(3, 3, 7, 7, 7, 7, 7, 8, 0, 0),
time.diff = c(1000, 889, 3000, 2000, 500, 400, 0, 0, NA, NA))
#printed result:
time var1 var2 idx time.diff
1: 1000 1 0.5 3 1000
2: 1111 2 0.6 3 889
3: 2000 3 2.0 7 3000
4: 3000 4 0.8 7 2000
5: 4500 5 2.1 7 500
6: 4600 6 1.0 7 400
7: 5000 7 7.0 7 0
8: 7000 8 8.0 8 0
9: 8500 9 0.1 0 NA
10: 9000 10 0.3 0 NA
I can achieve this result with a combination of a for- and a while-loop. However, the execution is painfully slow. I wonder if there is a better solution that a) creates col idx with data.table syntax and b) uses the value from idx as offset parameter (n) in shift() ? Many thanks in advance!
Solution 1:[1]
Converting my comment to an answer so the quesiton will be answered
my_dt[, time.diff := .SD[my_dt, time - i.time, on = .(var2 >= var1), mult = "first"]]
Basically,
my_dt[my_dt, on = .(var2 >= var1)]is a self non-equi join.mult = "first"is for picking only the first match, otherwise it will join eachvar1to all<= var2valuesBecause it's a self-join, both tables have same columns, hence the
i.prefix stands for the column in the table in thei-th place (as inDT[i, j, by])Re the
.SDpart, it is just because the assignment operator (:=) messes up the order, meaning, the following line works (without assigment), but get messed up with it. Probably has something to do with the internal order of operation withindata.table[i, j, by]my_dt[my_dt, time - i.time, on = .(var2 >= var1), mult = "first"] # [1] 1000 889 3000 2000 500 400 0 0 NA NA
Solution 2:[2]
my_dt[,idx := max.col(outer(var1, var2, '<='), 'first')][,
idx := idx * NA^(var2[idx]<var1)][,
time_diff := time[idx] - time][]
time var1 var2 idx time_diff
1: 1000 1 0.5 3 1000
2: 1111 2 0.6 3 889
3: 2000 3 2.0 7 3000
4: 3000 4 0.8 7 2000
5: 4500 5 2.1 7 500
6: 4600 6 1.0 7 400
7: 5000 7 7.0 7 0
8: 7000 8 8.0 8 0
9: 8500 9 0.1 NA NA
10: 9000 10 0.3 NA NA
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | David Arenburg |
| Solution 2 | onyambu |
