'How can I plot the cumulative probability of an event occuring from a dataframe over time?
I want to find a general way to find the cumulative probability of an event using a dataframe output. For example, say I have a model for which I have run 3 simulations from days 0-5 which outputs this dataframe "hospital":
| Node | Time | Hospitalised |
|---|---|---|
| 1 | 0 | 0 |
| 2 | 0 | 0 |
| 3 | 0 | 0 |
| 1 | 1 | 0 |
| 2 | 1 | 1 |
| 3 | 1 | 0 |
| 1 | 2 | 0 |
| 2 | 2 | 0 |
| 3 | 2 | 1 |
| 1 | 3 | 0 |
| 2 | 3 | 1 |
| 3 | 3 | 3 |
| 1 | 4 | 1 |
| 2 | 4 | 1 |
| 3 | 4 | 0 |
| 1 | 5 | 0 |
| 2 | 5 | 0 |
| 3 | 5 | 0 |
I want to find and plot the cumulative probability that at least 1 person has been hospitalised over time. It is cumulative in that for each time point I care about whether there has ever been anyone in hospital for that particular simulation (currently or before the current time). The probability is for >0 hospitalised / total # simulations.
This would be the output for this simplified example

Solution 1:[1]
Here's a tidyverse solution:
library(tidyverse)
data %>%
group_by(Node) %>%
mutate(any_hospitalised = sign(cumsum(Hospitalised))) %>%
group_by(Time) %>%
summarize(probability = mean(any_hospitalised)) %>%
ggplot(aes(Time, probability)) +
geom_line() +
geom_point() +
theme_bw()
Although you may prefer a step plot to a line plot in this scenario:
data %>%
group_by(Node) %>%
mutate(any_hospitalised = sign(cumsum(Hospitalised))) %>%
group_by(Time) %>%
summarize(probability = mean(any_hospitalised)) %>%
ggplot(aes(Time, probability)) +
geom_step() +
geom_point() +
theme_bw()
Created on 2022-03-06 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Allan Cameron |


