'Generate random date after a date
I have a dataset like this:
set.seed(123)
date_entry<- sample(seq(as.Date('2000-01-01'), as.Date('2010-01-01'), by="day"), 1000)
df <- data.frame( date_entry)
df <- df %>% mutate(id = row_number())
I want to to generate a random date_end column for each id that is greater than date_entry. For instance, for these dates, I want greater than 2006 for id=1:3 and 2002 for id=4.
date_entry id
1 2006-09-28 1
2 2006-11-15 2
3 2006-02-04 3
4 2001-06-09 4
5 2000-07-13 5
Solution 1:[1]
Create a daily sequence between date_entry and today's date (i.e., Sys.Date()), then pick 1 sample for date_end.
library(tidyverse)
df %>%
rowwise %>%
mutate(date_end = sample(seq(date_entry, Sys.Date(), by="day"), 1))
Output
date_entry id date_end
<date> <int> <date>
1 2006-09-28 1 2016-01-08
2 2006-11-15 2 2019-04-27
3 2006-02-04 3 2016-02-17
4 2001-06-09 4 2012-12-26
5 2000-07-13 5 2008-11-12
6 2008-03-04 6 2011-12-27
7 2005-01-15 7 2015-01-04
8 2003-02-15 8 2020-07-28
9 2009-03-24 9 2014-11-01
10 2003-06-06 10 2004-03-22
# … with 990 more rows
Solution 2:[2]
Pick a random number of days to add to each date_entry. Here I sample uniformly between 1 and 100,000 days to add - pick whatever range of possibilities / distribution you want.
df %>%
mutate(date_end = date_entry + sample(1:1e5, size = n(), replace = TRUE))
# date_entry id date_end
# 1 2006-09-28 1 2104-02-13
# 2 2006-11-15 2 2199-06-24
# 3 2006-02-04 3 2042-08-30
# 4 2001-06-09 4 2153-04-10
# 5 2000-07-13 5 2140-04-28
# 6 2008-03-04 6 2106-07-06
# 7 2005-01-15 7 2169-06-14
# ...
If you want to make sure the date_end is in the following year (maybe somewhat implied in your question?), round up before adding random days:
df %>%
mutate(date_end =
lubridate::ceiling_date(date_entry, unit = "year") +
sample(0:1e5, size = n(), replace = TRUE)
)
Solution 3:[3]
In a function f we may use as.POSIXlt and add 1901 to the year element, which simply yields next year, in which we create January 1st using ISOdate. Transformed as.Date we add a random integer from zero up to a defined dmax, resulting in the desired random date starting no earlier than the following year.
f <- \(x, dmax=3652) with(as.POSIXlt(x), as.Date(ISOdate(year + 1901, 1, 1)) +
sample(0:dmax, length(x), replace=TRUE))
set.seed(42)
transform(dat, date_end=f(date_entry))
# date_entry id date_end
# 1 2006-09-28 1 2014-02-21
# 2 2006-11-15 2 2013-06-26
# 3 2006-02-04 3 2010-03-22
# 4 2001-06-09 4 2005-01-02
# 5 2000-07-13 5 2004-06-05
# 6 2008-03-04 6 2017-10-23
Data:
dat <- structure(list(date_entry = structure(c(13419, 13467, 13183,
11482, 11151, 13942), class = "Date"), id = 1:6), class = "data.frame", row.names = c(NA,
-6L))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | AndrewGB |
| Solution 2 | Gregor Thomas |
| Solution 3 | jay.sf |
