'Error in xgboost::xgb.DMatrix(as.matrix(mydat %>% dplyr::select(date, : 'data' has class 'character' and length 176. in R

This is my dput() dataset

mydat=structure(list(date = c("22.06.2021", "22.06.2021", "22.06.2021", 
"22.06.2021", "22.06.2021", "22.06.2021", "22.06.2021", "22.06.2021", 
"22.06.2021", "22.06.2021", "22.06.2021", "22.06.2021", "22.06.2021", 
"22.06.2021", "22.06.2021", "22.06.2021", "22.06.2021", "22.06.2021", 
"23.06.2021", "23.06.2021", "23.06.2021", "23.06.2021", "23.06.2021", 
"23.06.2021", "23.06.2021", "23.06.2021", "23.06.2021", "23.06.2021", 
"23.06.2021", "23.06.2021", "23.06.2021", "23.06.2021", "23.06.2021", 
"23.06.2021", "23.06.2021", "23.06.2021", "23.06.2021", "23.06.2021", 
"23.06.2021", "23.06.2021", "23.06.2021", "23.06.2021", "24.06.2021", 
"24.06.2021"), hour = c(6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 
15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 0L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 
18L, 19L, 20L, 21L, 22L, 23L, 0L, 1L), weekday = c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L), base_price = c(3250.87, 
3261.89, 3272.91, 3283.93, 3294.95, 3305.97, 3316.98, 3328, 3339.02, 
3350.04, 3361.06, 3372.08, 3383.1, 3394.12, 3405.14, 3416.16, 
3427.17, 3438.19, 3449.21, 3460.23, 3471.25, 3482.27, 3493.29, 
3504.31, 3515.33, 3526.35, 3537.36, 3548.38, 3559.4, 3570.42, 
3581.44, 3592.46, 3603.48, 3614.5, 3625.52, 3636.54, 3647.55, 
3658.57, 3669.59, 3680.61, 3691.63, 3702.65, 3713.67, 3724.69
)), class = "data.frame", row.names = c(NA, -44L))

I'm trying to learn how to use boosting for time series analysis, but I'm having some difficulty. The example I am trying to do.

library(xgboost)
library(dplyr)
library(lubridate)

extended_data_mod <- mydat %>%
  dplyr::mutate(., 
                index_date = as.Date(paste0(lubridate::year(date), "-", lubridate::month(date), "-01")),
                months = lubridate::month(index_date),
                years = lubridate::year(index_date))
mydat <- extended_data_mod[1:length(ts), ] # initial data

pred <- extended_data_mod[(length(ts) + 1):nrow(extended_data), ] # extended time index

x_train <- xgboost::xgb.DMatrix(as.matrix(mydat %>%
                                            dplyr::select(date,   hour,   weekday,    
                                                          base_price)))
x_pred <- xgboost::xgb.DMatrix(as.matrix(pred %>% 
                                           dplyr::select(date,   hour,   weekday,    
                                                         base_price)))

y_train <- mydat$base_price
#learn the model
xgb_trcontrol <- caret::trainControl(
   method = "cv", 
   number = 5,
   allowParallel = TRUE, 
   verboseIter = FALSE, 
   returnData = FALSE
)

and my error

for the place of the desired result i get the error
Error in xgboost::xgb.DMatrix(as.matrix(mydat %>% dplyr::select(date, :
    'data' has class 'character' and length 176.

What did I do wrong and how do that this code would correct work? thank you in advance.



Solution 1:[1]

You need to use as.numeric on the date column. Try this:

x_train <- xgboost::xgb.DMatrix(
  as.matrix(mydat %>%
              dplyr::mutate(date = as.numeric(date)) %>% 
              dplyr::select(date,hour,weekday,base_price))
)

Similarly for x_pred

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 langtang