'data.table count observations close in distance and time of current observation

I am looking to calculate a new column "congestion" by counting the number of times values are within sec +/- 5 and within x +/- 5 and within y +/- 5 of the current row. Essentially I am wanting to find observations that occur within a close distance (x,y) and time period (sec) of the current observation which is just a big count ifelse statement. All values are numerical.

current data.table

data <- data.table(x = c(1,3,10,15,6), 
y = c(5,5,11,14,19), 
sec=c(1,3,5,6,9))

desired output

data <- data.table(x = c(1,3,10,15,6), 
y = c(5,5,11,14,6), 
sec=c(1,3,5,6,7),
congestion = c(1,2,1,1,2)

preferable solution in data.table but happy to work in dplyr.

r dplyr count data.table

Solution 1:^[1]

I think your "desired output" is incorrect given the criteria you've specified.

However, if your data is small enough you can do a full-join on the data with itself, and filter out invalid combinations

library(data.table)

data <- data.table(x = c(1,3,10,15,6), 
                   y = c(5,5,11,14,19), 
                   sec=c(1,3,5,6,9))

data[, join_key := 1L ]     ## specify a key on which to join

data[
  data
  , on = .(join_key)                        ## Full Join to put all possible combinations together
  , allow.cartesian = TRUE
][
  (x >= i.x * 5 * -1 & x <= i.x * 5) &           ## Filter the valid combinations
    (y >= i.y * 5 * -1 & y <= i.y * 5) &
    (sec >= i.sec - 5 & sec <= i.sec + 5)
  , .(
    congestion = .N
  )
  , by = .(x, y, sec)
]

#     x  y sec congestion
# 1:  1  5   1          4
# 2:  3  5   3          4
# 3: 10 11   5          4
# 4: 15 14   6          4
# 5:  6 19   9          3

A slightly more efficient approach might be to do a by = .EACHI join (borrowing the concept from this answer

data[, row_idx := 1L]

data[
  data
  , {
    idx = (x >= i.x * 5 * -1 & x <= i.x * 5) &
      (y >= i.y * 5 * -1 & y <= i.y * 5) & 
      (sec >= i.sec - 5 & sec <= i.sec + 5)
    .(
      x = x[ idx ]
      , y = y[ idx ]
      , sec = sec[ idx ]
    )
  }
  , on = .(row_idx)
  , by = .EACHI
][
  , .(congestion = .N)
  , by = .(x, y, sec)
]

#     x  y sec congestion
# 1:  1  5   1          4
# 2:  3  5   3          4
# 3: 10 11   5          4
# 4: 15 14   6          4
# 5:  6 19   9          3

Solution 2:^[2]

You could define the limits and join on them:

data[,`:=`(x_high = x +5,
           x_low = x - 5,
           y_high = y + 5,
           y_low = y - 5,
           sec_high = sec +5,
           sec_low = sec - 5)]

data[data,.(x,y,sec,x.x,x.y,x.sec),
          on=.(x>=x_low,
               x<=x_high,
               y>=y_low,
               y<=y_high,
               sec>=sec_low,
               sec<=sec_high)][
      !(x==x.x&y==x.y&sec==x.sec),.(congestion=.N),by=.(x,y,sec)]

       x     y   sec congestion
   <num> <num> <num>      <int>
1:     1     5     1          1
2:     3     5     3          1
3:    10    11     5          1
4:    15    14     6          1

According to the +/- 5 rule, I find less congestions than your expected result. If I understood correctly the constraints, this seems correct to me.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2

'data.table count observations close in distance and time of current observation

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]