'Is there a way to vectorize seq() and grep() to use on conjunction with dplyr?
Apologies if this is obvious, I don't have much experience with R. I have a function contains_leap_year(date1, date2) that I want to pass in as a condition to dplyr::if_else().
My for loop implementation looks like this
contains_leap_year <- c()
for (i in 1:nrow(df)) {
if (df$date1[i] < 0 & !is.na(df$date2[i])) {
seq_str <- seq(df$date1[i], dat$date2[i], by = "day")
res <- (length(grep("-02-29", seq_str)) > 0)
}
else {
res <- FALSE
}
contains_leap_year <- append(contains_leap_year, res)
}
Then I would append this column to my dataframe, and do something like
dplyr::mutate(
res = dplyr::if_else(contains_leap_year == TRUE, action1, action2)
)
But this is rather slow. Ideally, I'd like to work within dplyr the whole time like so
dplyr::mutate(
res = dplyr::if_else(length(grep("-02-29", seq(date1, date2, by = "day"))) > 0, action1, action2)
)
However, just doing this throws 'from' must be of length 1 error, which I believe is because date1 and date2 are vectors, so seq cannot construct the sequence.
If this isn't possible, is there an alternative method that is faster than just a for loop?
Solution 1:[1]
While not ideal, I've settled (for now) on just looping over the vector, but using furrr::future_map2 to do so. I don't have any rigorous benchmarks, but it's about 2.5x faster than purr::map2 on my dataset, and something around 10x faster than a for loop.
Example function
contains_leap_day <- function(x, y) {
date_seqs <- format(seq(x, y, by = "day"))
res <- (length(stringr::str_which(date_seqs, "-02-29")) > 0)
return(res)
}
future::plan(multisession)
df %>%
dplyr::mutate(
has_leap_day = furrr::future_map2(year1, year2, contains_leap_day, .progress = TRUE)
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | stressed |
