'R-- Return value with the nearest date [duplicate]
I am attempting to extract the closest value based on date between 2 tables.
Table 1 looks like:
| id | value | date |
|---|---|---|
| 1 | aa | 10/2/21 |
| 2 | ab | 4/6/18 |
Table 2 looks like:
| id | value_2 | date_2 |
|---|---|---|
| 1 | ba | 4/12/11 |
| 1 | bb | 8/16/08 |
| 1 | bc | 10/4/21 |
| 2 | bd | 4/26/18 |
| 2 | bd | 3/26/19 |
I would like the output table to return value_2 that corresponds to the same id in both tables based on the closest date. ex- id 1 should return value bc since 10/4/21 is closest to 10/2/21
So the output table should mutate this value_2 to look like-
| id | value | date | value_2 |
|---|---|---|---|
| 1 | aa | 10/2/21 | bc |
| 2 | ab | 4/6/18 | bd |
Solution 1:[1]
There's currently an error in one of the date formats. Further there's some unclarity on what is "closest" (do we only look back or do we also look forward?)
If we are only interested in truly "closest", we could simply use a k-nearest neighbour approach with a single neighbour (k = 1).
library(e1071)
table1 <- data.frame(value = factor(c('aa', 'ab')),
date = as.Date(c('10/2/21', '4/6/18'),
tryFormats = c('%m/%d/%y', '%d/%m/%y')))
table2 <- data.frame(value = factor(c('ba', 'bb', 'bc', 'bd', 'bd')),
date = as.Date(c('4/12/11', '8/16/08',
# Chagned for example
'2/15/21',
'4/26/18', '3/26/19'),
tryFormats = c('%m/%d/%y', '%d/%m/%y')))
mod <- gknn(value ~ date, k = 1, data = table2)
table1$predict <- predict(mod, newdata = table1)
table1
# value date predict
# 1 aa 2021-10-02 bc
# 2 ab 2018-04-06 bd
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Oliver |
