'Use group_by on two variables to do a rolling correlation

I am trying to use the correlation package to do a rolling correlation with a group_by case - where I have both years and product id. My solution only works on the product id but not on the rolling years. Any suggestion I can get this rolling work or am I doing wrong in the group_by function?

library(correlation)
library(dplyr)

dk <- structure(list(Year = c(2015L, 2015L, 2015L, 2016L, 2016L, 2016L, 
2017L, 2017L, 2017L, 2018L, 2018L, 2018L, 2019L, 2019L, 2019L
), Products = c("apple", "orange", "melon", 
"apple", "orange", "melon", "apple", 
"orange", "melon", "apple", "orange", 
"melon", "apple", "orange", "melon"
), Quantity = c(35960.58, 9346.44, 18974.56, 45325.8, 12386.41, 20238.13, 
60766.81, 14695.38, 24441.08, 65596.34, 10673.11, 19686.87, 72737.28, 
8183.69, 21953.6), Sales = c(11811, 1300.46, 32134, 11069, 1194.63, 
35909.37, 11408, 1747.29, 40254.61, 12250, 2143.72, 38844.54, 
11937, 2066.28, 40234.98)), row.names = c(NA, -15L), class = c("tbl_df", 
"tbl", "data.frame"))

dk %>% 
    group_by(Products) %>%
    correlation(select = c("Quantity", "Sales"))

Group  | Parameter1 | Parameter2 |     r |        95% CI |  t(3) |     p
------------------------------------------------------------------------
apple  |   Quantity |      Sales |  0.44 | [-0.72, 0.95] |  0.86 | 0.455
melon  |   Quantity |      Sales |  0.74 | [-0.41, 0.98] |  1.89 | 0.155
orange |   Quantity |      Sales | -0.23 | [-0.93, 0.82] | -0.42 | 0.705

# How can this work?
dk %>% 
    group_by(Year, Products) %>%
    correlation(select = c("Quantity", "Sales"))

r dplyr

Solution 1:^[1]

Here is a solution using slider. I'm assuming that the "rolling" correlation is based on three consecutive years.

I had to abandon correlation::correlation and use cor.test to get a similar result, because I had trouble using slider to create the "roll" and correlation in the same command. I wish there was an easier way to do this:

library(slider)

dk %>% 
  group_by(Products) %>%
  summarize(res = slide(
    Year, 
    ~ with(cor.test(x = Quantity[Year %in% .x], y = Sales[Year %in% .x]),
           tibble(
             Years = paste0(.x, collapse = "-"),
             r = estimate,
             `95 % CI` = ifelse(exists("conf.int"), sprintf("[%0.3f, %0.3f]", conf.int[1], conf.int[2]), NA),
             t = statistic,
             df = parameter,
             p = p.value)),
    .before = 1, 
    .after = 1,
    .complete = TRUE)) %>%
  ungroup() %>%
  unnest(res)

But the result looks pretty neat! (although without confidence because I need 4 points)

# A tibble: 9 × 7
  Products Years               r `95 % CI`      t    df     p
  <chr>    <chr>           <dbl> <lgl>      <dbl> <int> <dbl>
1 apple    2015-2016-2017 -0.419 NA        -0.462     1 0.724
2 apple    2016-2017-2018  0.860 NA         1.69      1 0.340
3 apple    2017-2018-2019  0.531 NA         0.626     1 0.644
4 melon    2015-2016-2017  0.966 NA         3.75      1 0.166
5 melon    2016-2017-2018  0.675 NA         0.915     1 0.528
6 melon    2017-2018-2019  0.859 NA         1.67      1 0.343
7 orange   2015-2016-2017  0.708 NA         1.00      1 0.499
8 orange   2016-2017-2018 -0.337 NA        -0.358     1 0.781
9 orange   2017-2018-2019 -0.840 NA        -1.55      1 0.365

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Use group_by on two variables to do a rolling correlation

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]