'Finding the turning point (inflection point) of COVID-19 infections using inflection package
I am not sure if I have done my inflection point calculation correctly. Based on lab confirmed cumulative case data in the epicenter of the current epidemic, we have tried to identify the inflection point. I used the inflection
package and calculated the inflection point as "08 Feb 2020". I have also tried to calculate the first and second directives as estimated increase each and changing rate.
Are those results from the following graphs consistent?
df<-structure(list(date = structure(c(18277, 18278, 18279, 18280,
18281, 18282, 18283, 18284, 18285, 18286, 18287, 18288, 18289,
18290, 18291, 18292, 18293, 18294, 18295, 18296, 18297, 18298,
18299, 18300, 18301, 18302, 18303, 18304, 18305, 18306, 18307),
class = "Date"),
cases = c(45, 62, 121, 198, 258, 363, 425,
495, 572, 618, 698, 1590, 1905, 2261, 2639, 3125, 4109, 5142,
6384, 8351, 10117, 11618, 13603, 14982, 16903, 18454, 19558,
20630, 21960, 22961, 23621)),
class = "data.frame", row.names = c(NA, -31L))
xlb_0<- structure(c(18281, 18285, 18289, 18293,
18297, 18301, 18305,
18309), class = "Date")
library(tidyverse)
# Smooth cumulative cases over time
df$x = as.numeric(df$date)
fit_1<- loess(cases ~ x, span = 1/3, data = df)
df$case_sm <-fit_1$fitted
# use inflection to obtain inflection point
library(inflection)
guai_0 <- check_curve(df$x, df$case_sm)
check_curve(df$x, df$cases)
#> $ctype
#> [1] "convex_concave"
#>
#> $index
#> [1] 0
guai_1 <- bese(df$x, df$cases, guai_0$index)
structure(guai_1$iplast, class = "Date")
#> [1] "2020-02-08"
# Plot cumulativew numbers of cases
df %>%
ggplot(aes(x = date, y = cases ))+
geom_line(aes(y = case_sm), color = "red") +
geom_point() +
geom_vline(xintercept = guai_1$iplast) +
labs(y = "Cumulative lab confirmed infections")
# Daily new cases (first derivative) and changing rate (second derivative)
df$dt1 = c(0, diff(df$case_sm)/diff(df$x))
fit_2<- loess(dt1 ~ x, span = 1/3, data = df)
df$change_sm <-fit_2$fitted
df$dt2 <- c(NA, diff(df$change_sm)/diff(df$x))
df %>%
ggplot(aes(x = date, y = dt1))+
geom_line(aes(y = dt1,
color = "Estimated number of new cases")) +
geom_point(aes(y = dt2*2, color = "Changing rate")) +
geom_line(aes(y = dt2*2, color = "Changing rate"))+
geom_vline(xintercept = guai_1$iplast) +
labs(y = "Estimatede number of new cases") +
scale_x_date(breaks = xlb_0,
date_labels = "%b%d") +
theme(legend.title = element_blank())
#> Warning: Removed 1 rows containing missing values (geom_point).
#> Warning: Removed 1 row(s) containing missing values (geom_path).
Created on 2020-02-17 by the reprex package (v0.3.0)
Solution 1:[1]
I was gonna write a comment, but I was pushing the character limit.
I am not familiar with the inflection
package so I am not one to judge if the 2020-02-08
is the true inflection. However, I will say this is difficult to answer with R because R is not necessarily good at calculating derivatives. If you had an estimated line equation - then you could potentially use this to plot first and second derivatives. Calculating rough delta's by doing the difference in (Y_n+1-Y_n)/(X_n+1-X_n)
is never optimal because a derivate in theory is the delta of two points infinitesimally close to each other. You fundamentally cannot get a great estimate of the derivative. You can even see this because you are forced to shift this estimate to either n
or n+1
. Furthermore, you would expect the inflection point of x_0
to be a local min/max in the first derivative and equal to zero in the second derivative. So I don't think your second plot helps. But this could just be due to the delta's calculated.
What I would do is first fit your data to some type of model.
In this example I'm going to use the package dr4pl
to model your data to the 4 parameter logistic model.
Since the function of the 4 parameter model is well known, I can write what the first and second derivative functions should be, then plot those values using stat_function
in the ggplot2
package.
library(ggplot2)
library(dr4pl)
df<-structure(list(date = structure(c(18277, 18278, 18279, 18280,
18281, 18282, 18283, 18284, 18285, 18286, 18287, 18288, 18289,
18290, 18291, 18292, 18293, 18294, 18295, 18296, 18297, 18298,
18299, 18300, 18301, 18302, 18303, 18304, 18305, 18306, 18307),
class = "Date"),
cases = c(45, 62, 121, 198, 258, 363, 425,
495, 572, 618, 698, 1590, 1905, 2261, 2639, 3125, 4109, 5142,
6384, 8351, 10117, 11618, 13603, 14982, 16903, 18454, 19558,
20630, 21960, 22961, 23621)),
class = "data.frame", row.names = c(NA, -31L))
xlb_0<- structure(c(18281, 18285, 18289, 18293,
18297, 18301, 18305,
18309), class = "Date")
df$dat_as_num <- as.numeric(df$date)
dr4pl_obj <- dr4pl(cases~dat_as_num, data = df, init.parm = c(30000, 18300, 2, 0))
#first derivative derivation
d1_dr4pl <- function(x, theta, scale = F){
if (any(is.na(theta))) {
stop("One of the parameter values is NA.")
}
if (theta[2] <= 0) {
stop("An IC50 estimate should always be positive.")
}
f <- -theta[3]*((theta[4]-theta[1])/((1+(x/theta[2])^theta[3])^2))*((x/theta[2])^(theta[3]-1))
if(scale) {
f <- scales::rescale(x = f, to = c(theta[4],theta[1]))
}
return(f)
}
#Second derivative derivation
d2_dr4pl <- function(x, theta, scale = F){
if (any(is.na(theta))) {
stop("One of the parameter values is NA.")
}
if (theta[2] <= 0) {
stop("An IC50 estimate should always be positive.")
}
f <- 2*((theta[3]*(x/theta[2])^(theta[3]-1))^2)*((theta[4]-theta[1])/((1+(x/theta[2])^(theta[3]))^3))-theta[3]*(theta[3]-1)*((x/theta[2])^(theta[3]-2))*((theta[4]-theta[1])/((1+(x/theta[2])^theta[3])^2))
if(scale) {
f <- scales::rescale(x = f, to = c(theta[4],theta[1]))
f <- f - f[1]
}
return(f)
}
ggplot(df, aes(x = dat_as_num)) +
geom_hline(yintercept = 0) +
[![enter image description here][1]][1]geom_point(aes(y = cases), color = "grey", alpha = .6, size = 5) +
stat_function(fun = d1_dr4pl, args = list(theta = dr4pl_obj$parameters, scale = T), color = "red") +
stat_function(fun = d2_dr4pl, args = list(theta = dr4pl_obj$parameters, scale = T), color = "blue") +
stat_function(fun = dr4pl::MeanResponse, args = list(theta = dr4pl_obj$parameters), color = "gold") +
geom_vline(xintercept = dr4pl_obj$parameters[2], linetype = "dotted") +
theme_classic()
As you can see, the inflection point, which is the IC50 value (theta 2) of the 4 parameter logistic model, lines up well when we approach it this way.
summary(dr4pl_obj)
#$call
#dr4pl.formula(formula = cases ~ dat_as_num, data = df, init.parm = c(30000, 18300, 2, 0))
#
#$coefficients
# Estimate StdErr 2.5 % 97.5 %
#Upper limit 25750.61451 4.301008e-05 25750.59681 25750.63221
#Log10(IC50) 18298.75347 4.301008e-09 18298.67889 18298.82806
#Slope 5154.35449 4.301008e-05 5154.33678 5154.37219
#Lower limit 58.48732 4.301008e-05 58.46962 58.50503
#
#attr(,"class")
#[1] "summary.dr4pl"
Furthermore, using dr4pl
, it says the IC50 value is roughly 18298.8
, which is late 2020-02-06
. Not too far off from the inflection
value. I'm sure there may be a better model to use than the 4pl, but it was just the one that I knew I could write first and second derivatives for the purposes of answering this question.
I'm sure other coding languages are more specialized when it comes to derivatives, and can even calculate them for you so long as you start with an initial function. I think one of these languages is mathematica.
A disclaimer, I ended up scaling the first and second derivatives so that they could be plotted together. Their actual values are much larger than shown here.
Solution 2:[2]
A very quick back-of-the-envelope plot based on your data shows
calc_d <- function(x) c(0, diff(x))
df %>%
mutate(
first_deriv_cases = calc_d(cases),
second_deriv_cases = calc_d(calc_d(cases))) %>%
pivot_longer(-date) %>%
ggplot(aes(date, value)) +
geom_line() +
facet_wrap(~name, scale = "free_y", ncol = 1) +
geom_smooth()
So the inflection point at 8 February is consistent with the first derivative (i.e. the density function) having a maximum at that point.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Maurits Evers |