'Warning from GAM predicting deaths from weather time series data: ti(time) vs ti(year, day_of_year) in mgcv

I am getting the following warning when running a generalised additive model (GAM) with the R package mgcv:

Warning in newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L = G$L,  :
  Fitting terminated with step failure - check results carefully

I am modelling the effect of temperature and other weather variables on daily number of deaths. Because deaths may occur not just on the same day but also over a period of time following particularly high exposure, distributed lag needs to be included in the model. I am interested in understanding how day-to-day variability in temperature relates to deaths and not/ less interested in understanding the effects of seasonality and long-term trend, which I am trying to "remove"/ control for.

I am trying to decide between two approaches to model time:

  1. a single predictor for time e.g. s(time) where time is a counter running from 1 to 5,113 (the total number of days in the data set), like
    m1 <- gam(deaths~s(time, k = 500) + heap + 
                     s(temp_max, k=25) + 
                     s(lag, k=5) + 
                     ti(temp_max, lag, k=c(25,5)), 
                     data = dat, family = nb, method = 'ML')
  1. using a ti() interaction with year and doy (day of year), where year models the long-term trend, doy models seasonality and ti(year, doy) models the interaction between long-term trend and seasonality (e.g. El-Nino/ la Nina effects) like
   knots <- list(doy=c(0.5, 366.5))

   m2 <- gam(deaths~ s(year, k= 14) + 
                     s(doy, k=100, bs = 'cc') + 
                     ti(year, doy, bs = c('tp', 'cc'), k = c(10, 20)) + heap + 
                     s(temp_max, k=25) + s(lag, k=5) + 
                     ti(temp_max, lag, k=c(15,5)), 
                     data = dat, family = nb, method = 'ML', knots = knots)

Model m2 leads to the warning mentioned above. What does this warning mean and how should I resolve it?

The output from gam.check(m2) is as follows:

Method: ML   Optimizer: outer newton
step failed after 13 iterations.
Gradient range [-2.330371,775600.9]
(score 8841.651 & scale 1).
eigenvalue range [-447129799866,5.309916].
Model rank =  525 / 526 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

                       k'      edf k-index p-value   
s(year)          1.30e+01 1.07e+01    0.96   0.085 . 
s(doy)           9.80e+01 1.50e+01    0.95   0.005 **
ti(year,doy)     1.62e+02 1.01e+01    0.96   0.065 . 
s(temp_max)      2.40e+01 3.08e+00      NA      NA   
s(lag)           4.00e+00 1.73e-14      NA      NA   
ti(temp_max,lag) 5.60e+01 1.80e+01      NA      NA   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Only after substantially reducing k and changing default tprs splines to cubic regression splines, does the warning disappear, as in model m3:

    m3 <- gam(deaths~ s(year, k= 5, bs = 'cr') + 
                      s(doy, k=20, bs = 'cc') + 
                      ti(year, doy, bs = c('cr', 'cc'), k = c(8, 10)) + heap + 
                      s(temp_max, k=8, bs = 'cr') + 
                      s(lag, k=5, bs = 'cr') + 
                      ti(temp_max, lag, k=c(8,5), bs = c('cr', 'cr')), 
                      data = dat, family = nb, method = 'ML', knots = knots)

For anyone interested, the daily time series data (5,113 days), spanning 14 years from 2001-2014, looks as follows:

           date doy year time heap heap_bin deaths  temp    sh    rh  solar wind2m precip_hourly precip_daily_total
1016 2003-10-13 286 2003 1016    0        0      3 32.37 19.10 61.19 701.45   0.50          0.43               4.58
1017 2003-10-14 287 2003 1017    0        0      3 29.97 19.53 71.69 773.80   2.02          0.68               4.60
1018 2003-10-15 288 2003 1018   34        1     21 29.06 18.43 71.56 852.81   1.59          0.73               3.47
1019 2003-10-16 289 2003 1019    0        0      4 27.54 18.01 76.38 701.43   1.57          0.76               6.32
1020 2003-10-17 290 2003 1020    0        0      2 29.86 17.76 65.88 669.89   2.15          0.36               3.49
1021 2003-10-18 291 2003 1021    0        0      5 31.03 16.91 58.56 857.88   3.14          0.03               0.21

Explanation of some variables: doy = day of year, running from 1-366, time = a counter of days running from 1 – 5,113, temp = temperature, sh = specific humidity, solar = solar irradiance, wind2m = wind speed and precip_daily_total = total daily rain fall. Heap and heap_bin are explained below.

The data set (.rds and .csv) can be found on my GitHub site.

The full code is below. Any help would be greatly appreciated!

library(readr)
library(mgcv)

df <- read_rds("data_crossvalidated_post.rds")

# Create matrices for 7-day lagged variables based on example by Simon Wood (ref in post)

lagard <- function(x,n.lag=7) {
    n <- length(x); X <- matrix(NA, n, n.lag)
    for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1]
    X
}

df$heap <- as.factor(df$heap)
df$heap_bin <- as.factor(df$heap_bin)

dat <- list(lag=matrix(0:6, nrow(df), 7, 
         byrow=TRUE), deaths=df$deaths, 
         doy=df$doy, year = df$year, 
         time = df$time, heap=df$heap, 
         heap_bin = df$heap_bin)
dat$precip_hourly <- lagard(df$precip_hourly)
dat$precip_daily_total <- lagard( df$precip_daily_total)
dat$temp <- lagard(df$temp)
dat$sh <- lagard(df$sh)
dat$rh <- lagard(df$rh)
dat$solar <- lagard(df$solar)
dat$wind2m <- lagard(df$wind2m)

knots <- list(doy=c(0.5, 366.5)) # set knots for
    # cyclic spline for day of year

# model currently being attempted to run

m3 <- gam(deaths~te(year, doy, bs = c('cr', 'cc'), 
     k = c(7, 50)) + heap  + 
     te(temp, lag, k=c(8, 4), 
     bs = c('cr', 'cr')) + 
     te(sh, lag, k=c(8,4), bs = c('cr', 'cr')) + 
     te(solar, lag, k=c(8,4), bs = c('cr', 'cr')) 
     + te(wind2m, lag, k=c(8,4), bs = c('cr', 
     'cr')) + te(precip_daily_total, lag, 
     k=c(8,4), bs = c('cr', 'cr')) + 
     te(temp, sh, solar, wind2m, lag, 
     bs = c('cr', 'cr', 'cr', 'cr', 'cr')), 
     data = dat, family = poisson, 
     method = 'REML', knots = knots)



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source