'How to aggregate a categorical variable with 20 levels in hierarchical/ grouped forecasting and use it at external regressor

Fellow contributors,

I have been working with a hierarchical time series, concerning a set of identical products in a number of stores. For this purpose when we aggregate the data set based on 2 attributes like "store" and "product_type" in my case, we should then aggregate the target variable which is "demand" for every individual product for every group or hierarchy. What I would like to do is add another categorical variable to my model, but this categorical variable has almost 23 levels and I would like to know how I can aggregate it. When it comes to demand which is a numeric variable for a set of products, it can be easily summed or averaged but even if I break my categorical variable into a number of separated dummy variables, I got the following error:

Provided exogenous regressors are rank deficient, removing regressors:

I know the reason is my external regressor here is a constant or only has 1 level. So I would like to know whether there is a way I can fix it and use external_reg as exogenous regressor in my model. Here is a sample data set:

library(tidyverse)
library(tsibble)
library(tsibbledata)
library(fable)
library(fabletools)
library(fpp3)
library(readxl)
library(fable.prophet)
library(feasts)

store <- c(rep('st1', 8), rep('st2', 8))
product_type <- c(rep('type1', 4), rep('type2', 4), rep('type1', 4), rep('type2', 4))
products <- c(rep('A', 2), rep('B', 2), rep('C', 2), rep('D', 2), 
              rep('A', 2), rep('B', 2), rep('C', 2), rep('D', 2))

demands <- c(round(sample(c(1:100), 16, replace = TRUE)))
external_reg <- c(sample(c('red', 'green', 'blue'), 16, replace = TRUE))
date_week <- rep(1:4, 4)
date_year <- rep(2019:2022, 4)

my_data <- tibble(date_year, date_week, store, product_type, products, demands, external_reg)

my_data %>%
  mutate(Date = ymd(paste0(date_year, "-01-01")) + weeks(date_week - 1)) %>%
  mutate(Week = yearweek(Date)) %>%
  as_tsibble(key = c(store, product_type), index = Week) %>%
  aggregate_key(store * product_type, Demand_Agg = sum(demands))


# A tsibble: 36 x 4 [53W]
# Key:       store, product_type [9]
       Week store        product_type Demand_Agg
     <week> <chr*>       <chr*>            <dbl>
 1 2019 W01 <aggregated> <aggregated>        188
 2 2020 W02 <aggregated> <aggregated>        142
 3 2021 W02 <aggregated> <aggregated>        259
 4 2022 W03 <aggregated> <aggregated>        186
 5 2019 W01 st1          <aggregated>         89
 6 2019 W01 st2          <aggregated>         99
 7 2020 W02 st1          <aggregated>         52
 8 2020 W02 st2          <aggregated>         90
 9 2021 W02 st1          <aggregated>         95
10 2021 W02 st2          <aggregated>        164
# … with 26 more rows

I am struggling so much with this issue recently so I would be grateful if someone could help me with this.

Thank you very much in advance and please let me know if I need to provide further additional data.

r time-series fable-r

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How to aggregate a categorical variable with 20 levels in hierarchical/ grouped forecasting and use it at external regressor

Sources

Related Questions