'Creating predicted vs observed confidence interval graph

Hello and thank you for you time and consideration,

I'd like to recreate this graph with ggplot. The top blue dots are the predicted values from my fitted model na_lmod and the lower red values are the observed values from one city's log_trip over the years.

Can you please help me combine these three functions ggplot(smooth/point), predict, and some sort of dplyr filter or something?

This code got me the log_trip and year for the desired city of interest but I'm struggling to even get that to graph. filter(transit, msaid == "Denver")[,c("log_trip", "year")]

enter image description here

Desired Output:

enter image description here



Solution 1:[1]

Here's an example. First, I'll generate some data and estimate the model.

library(tidyverse)
set.seed(123)
dat <- expand.grid(city = LETTERS[1:10], 
                   year = 2006:2018)
dat$log_trip <- log(abs(dat$year * .05 + rnorm(nrow(dat), 0, 100)))
dat$year <- as.factor(dat$year)
mod <- lm(log_trip ~ year, data=dat)

Next, we need to make some data that will be used to make predictions. For this model, since year is the only variable in it, this hypothetical data only has year in it. If you had other variables in the model, you would need to hold them constant at some (presumably central) value, like the mean.

pred_dat <- data.frame(year = factor(2006:2018))

We can then generate predictions with both confidence and prediction intervals:

preds <- predict(mod, newdata=pred_dat, interval="confidence")
preds2 <- predict(mod, newdata=pred_dat, interval="prediction")
preds <- as.data.frame(preds)

Next, we put the prediction intervals into the press data frame.

preds$lwr_pred <- preds2[,2]
preds$upr_pred <- preds2[,3]
pred_dat <- bind_cols(pred_dat, preds)

Next, we join the prediction data and the observed data from one city ("A" in this case).

pred_dat <- left_join(pred_dat, dat %>% filter(city == "A"))

For the plot, we need to turn year from a factor into numeric, then we can make the plot:

pred_dat  %>% 
  mutate(year = as.numeric(as.character(year))) %>% 
  ggplot() + 
  geom_ribbon(aes(x=year, ymin = lwr_pred, ymax=upr_pred, fill="Prediction"),  alpha=.25) + 
  geom_ribbon(aes(x=year, ymin = lwr, ymax=upr, fill="Confidence"),  alpha=.25) + 
  scale_fill_manual(values=c("blue", "gray65")) + 
  geom_point(aes(x=year, y=fit, color="Predicted")) + 
  geom_point(aes(x=year, y=log_trip, colour="Observed (City A)")) + 
  scale_colour_manual(values=c("red", "blue")) + 
  scale_x_continuous(breaks=2006:2018) + 
  labs(colour="Points", fill = "Intervals", x="Year", y="Predicted Values") + 
  theme_classic()

Created on 2022-05-12 by the reprex package (v2.0.1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 DaveArmstrong