'How can I plot the cumulative probability of an event occuring from a dataframe over time?

I want to find a general way to find the cumulative probability of an event using a dataframe output. For example, say I have a model for which I have run 3 simulations from days 0-5 which outputs this dataframe "hospital":

Node Time Hospitalised
1 0 0
2 0 0
3 0 0
1 1 0
2 1 1
3 1 0
1 2 0
2 2 0
3 2 1
1 3 0
2 3 1
3 3 3
1 4 1
2 4 1
3 4 0
1 5 0
2 5 0
3 5 0

I want to find and plot the cumulative probability that at least 1 person has been hospitalised over time. It is cumulative in that for each time point I care about whether there has ever been anyone in hospital for that particular simulation (currently or before the current time). The probability is for >0 hospitalised / total # simulations.

This would be the output for this simplified example



Solution 1:[1]

Here's a tidyverse solution:

library(tidyverse)

data %>% 
  group_by(Node) %>%
  mutate(any_hospitalised = sign(cumsum(Hospitalised))) %>%
  group_by(Time) %>%
  summarize(probability = mean(any_hospitalised)) %>%
  ggplot(aes(Time, probability)) +
  geom_line() +
  geom_point() +
  theme_bw()

enter image description here

Although you may prefer a step plot to a line plot in this scenario:

data %>% 
  group_by(Node) %>%
  mutate(any_hospitalised = sign(cumsum(Hospitalised))) %>%
  group_by(Time) %>%
  summarize(probability = mean(any_hospitalised)) %>%
  ggplot(aes(Time, probability)) +
  geom_step() +
  geom_point() +
  theme_bw()

enter image description here

Created on 2022-03-06 by the reprex package (v2.0.1)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Allan Cameron