'Use group_by on two variables to do a rolling correlation
I am trying to use the correlation package to do a rolling correlation with a group_by case - where I have both years and product id. My solution only works on the product id but not on the rolling years. Any suggestion I can get this rolling work or am I doing wrong in the group_by function?
library(correlation)
library(dplyr)
dk <- structure(list(Year = c(2015L, 2015L, 2015L, 2016L, 2016L, 2016L,
2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L
), Products = c("apple", "orange", "melon",
"apple", "orange", "melon", "apple",
"orange", "melon", "apple", "orange",
"melon", "apple", "orange", "melon"
), Quantity = c(35960.58, 9346.44, 18974.56, 45325.8, 12386.41, 20238.13,
60766.81, 14695.38, 24441.08, 65596.34, 10673.11, 19686.87, 72737.28,
8183.69, 21953.6), Sales = c(11811, 1300.46, 32134, 11069, 1194.63,
35909.37, 11408, 1747.29, 40254.61, 12250, 2143.72, 38844.54,
11937, 2066.28, 40234.98)), row.names = c(NA, -15L), class = c("tbl_df",
"tbl", "data.frame"))
dk %>%
group_by(Products) %>%
correlation(select = c("Quantity", "Sales"))
Group | Parameter1 | Parameter2 | r | 95% CI | t(3) | p
------------------------------------------------------------------------
apple | Quantity | Sales | 0.44 | [-0.72, 0.95] | 0.86 | 0.455
melon | Quantity | Sales | 0.74 | [-0.41, 0.98] | 1.89 | 0.155
orange | Quantity | Sales | -0.23 | [-0.93, 0.82] | -0.42 | 0.705
# How can this work?
dk %>%
group_by(Year, Products) %>%
correlation(select = c("Quantity", "Sales"))
Solution 1:[1]
Here is a solution using slider. I'm assuming that the "rolling" correlation is based on three consecutive years.
I had to abandon correlation::correlation and use cor.test to get a similar result, because I had trouble using slider to create the "roll" and correlation in the same command. I wish there was an easier way to do this:
library(slider)
dk %>%
group_by(Products) %>%
summarize(res = slide(
Year,
~ with(cor.test(x = Quantity[Year %in% .x], y = Sales[Year %in% .x]),
tibble(
Years = paste0(.x, collapse = "-"),
r = estimate,
`95 % CI` = ifelse(exists("conf.int"), sprintf("[%0.3f, %0.3f]", conf.int[1], conf.int[2]), NA),
t = statistic,
df = parameter,
p = p.value)),
.before = 1,
.after = 1,
.complete = TRUE)) %>%
ungroup() %>%
unnest(res)
But the result looks pretty neat! (although without confidence because I need 4 points)
# A tibble: 9 × 7
Products Years r `95 % CI` t df p
<chr> <chr> <dbl> <lgl> <dbl> <int> <dbl>
1 apple 2015-2016-2017 -0.419 NA -0.462 1 0.724
2 apple 2016-2017-2018 0.860 NA 1.69 1 0.340
3 apple 2017-2018-2019 0.531 NA 0.626 1 0.644
4 melon 2015-2016-2017 0.966 NA 3.75 1 0.166
5 melon 2016-2017-2018 0.675 NA 0.915 1 0.528
6 melon 2017-2018-2019 0.859 NA 1.67 1 0.343
7 orange 2015-2016-2017 0.708 NA 1.00 1 0.499
8 orange 2016-2017-2018 -0.337 NA -0.358 1 0.781
9 orange 2017-2018-2019 -0.840 NA -1.55 1 0.365
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
