'Is there a way to vectorize seq() and grep() to use on conjunction with dplyr?

Apologies if this is obvious, I don't have much experience with R. I have a function contains_leap_year(date1, date2) that I want to pass in as a condition to dplyr::if_else().

My for loop implementation looks like this

contains_leap_year <- c()
for (i in 1:nrow(df)) {
    if (df$date1[i] < 0 & !is.na(df$date2[i])) {
        seq_str <- seq(df$date1[i], dat$date2[i], by = "day")
        res <- (length(grep("-02-29", seq_str)) > 0)        
    }
    else {
        res <- FALSE
    }

    contains_leap_year <- append(contains_leap_year, res)
}

Then I would append this column to my dataframe, and do something like

dplyr::mutate(
    res = dplyr::if_else(contains_leap_year == TRUE, action1, action2)
)

But this is rather slow. Ideally, I'd like to work within dplyr the whole time like so

dplyr::mutate(
    res = dplyr::if_else(length(grep("-02-29", seq(date1, date2, by = "day"))) > 0, action1, action2)
)

However, just doing this throws 'from' must be of length 1 error, which I believe is because date1 and date2 are vectors, so seq cannot construct the sequence.

If this isn't possible, is there an alternative method that is faster than just a for loop?



Solution 1:[1]

While not ideal, I've settled (for now) on just looping over the vector, but using furrr::future_map2 to do so. I don't have any rigorous benchmarks, but it's about 2.5x faster than purr::map2 on my dataset, and something around 10x faster than a for loop.

Example function

contains_leap_day <- function(x, y) {
    date_seqs <- format(seq(x, y, by = "day"))
    res <- (length(stringr::str_which(date_seqs, "-02-29")) > 0)
    
    return(res)
}

future::plan(multisession)
df %>%
    dplyr::mutate(
        has_leap_day = furrr::future_map2(year1, year2, contains_leap_day, .progress = TRUE)
    )

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 stressed