'modelr add_predictions error: in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels)

I am facing the following error using modelr add_predictions function.

modelr add_predictions error: in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels): fe.lead.surgeon has new levels ....

In my understanding, it is a common issue that arises when you are making the prediction model using a train dataset and applying the model to a test dataset since the factor levels that existed in a train dataset may not be present in a test dataset. However, I am using the same sample for creating the model and getting the predicted values, and still getting this error.

Specifically, here is the code I am using, and I would appreciate it for any insight on why this error occurs and how to solve this issue.

# indep is a vector of independent variable names
# dep is a vector of dependent variable names
# id.case is the id variable
# sample is my dataset.

  eq <- 
            paste(indep, collapse = ' + ') %>%
            paste(dep, ., sep = ' ~ ') %>%
            as.formula  
          
          s <-
            lm(eq, data = sample %>% select(-id.case))
          
          pred <- 
            sample %>% 
            modelr::add_predictions(s) %>% 
            select(id.case, pred) 

As per the request of @SimoneBianchi, I am providing the reproducible example here.

Reproducible example

  library(tidyverse)
  library(tibble)
  library(data.table)
  
  rename <- dplyr::rename
  select <- dplyr::select
  
  set.seed(10002)
  id <- sample(1:1000, 1000, replace=F)
  
  set.seed(10003)
  fe1 <- sample(c('A','B','C'), 1000, replace=T)
  
  set.seed(10001)
  fe2 <- sample(c('a','b','c'), 1000, replace=T)
  
  set.seed(10001)
  cont1 <- sample(1:300, 1000, replace=T)
  
  set.seed(10004)
  value <- sample(1:30, 1000, replace=T)
  
  sample <-   
    data.frame(id, fe1, fe2, cont1, value) 

  dep <- 'value'
  
  indep <- 
    c('fe1','fe2', 'cont1')
  
  
  eq <- 
    paste(indep, collapse = ' + ') %>%
    paste(dep, ., sep = ' ~ ') %>%
    as.formula  
  
  s <-
    lm(eq, data = sample %>% select(-id))
  
  pred <- 
    sample %>% 
    modelr::add_predictions(s) %>% 
    select(id, pred)

Update and Workaround

One workaround I found is that you don't use modelr function but use fitted function. However, I would still want to learn why the regression automatically drops soma factor levels from a factor variable. If anyone knows, please leave a comment.

   pred <- 
    sample %>% 
    cbind(pred = fitted(s))

Closing: Problem found with the dataset

I found that some observations were NA that had new levels in the corresponding factor variable -- the error. After I fixed the NA, the original code worked fine. So, it was a problem with the dataset rather than the code!

Thank you all for trying to help me out.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source