'Sum values if date falls within range (two dataframes)
I have two data frames that look like below:
id1 <- c("1", "2", "3", "4", "4")
start <- c("2019-07-09", "2021-03-22", "2019-02-25", "2019-01-27", "2019-02-14")
end <- c("2019-07-10", "2021-03-27", "2019-02-27", "2019-01-28", "2019-02-16")
df1 <- data.frame(id1, start, end)
id2 <- c( "1","1", "2", "2" ,"2" ,"2" , "2", "2", "3","3", "3", "4", "4", "4" )
date <- c("2019-07-09","2019-07-10", "2021-03-22", "2021-03-23", "2021-03-24", "2021-03-25", "2021-03-26", "2021-03-27", "2019-02-25", "2019-02-26", "2019-02-27", "2019-01-27", "2019-02-14", "2019-02-15")
value <- c("0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5")
df2 <- data.frame(id2, date, value)
I am trying to match by id and sum the values in df2 if the dates fall with in the start and end date in df1.
The output should look like below. I've been trying to work this through dplyr and haven't had any luck.
###EXPECTED OUTPUT####
id1 <- c("1", "2", "3", "4", "4")
start <- c("2019-07-09", "2021-03-22", "2019-02-25", "2019-01-27", "2019-02-14")
end <- c("2019-07-10", "2021-03-27", "2019-02-27", "2019-01-28", "2019-02-16")
output_value <- c("1", "3", "1.5", "0.5", "1")
output <- data.frame(id1, start, end, output_value)
Thanks for your help!
Solution 1:[1]
You'll need to first combine the two dataframe using left_join, than make sure the date, start, and end columns are type of "Date", and make sure the column value is numeric. Then remove the date that are not within start and end. Finally group_by the relevant columns and sum up the value.
library(tidyverse)
left_join(df2, df1, by = c("id2" = "id1")) %>%
mutate(across(c(date, start, end), as.Date),
value = as.numeric(value)) %>%
filter(date >= start & date <= end) %>%
group_by(id2, start, end) %>%
summarize(value = sum(value), .groups = "drop")
# A tibble: 5 x 4
id2 start end value
<chr> <date> <date> <dbl>
1 1 2019-07-09 2019-07-10 1
2 2 2021-03-22 2021-03-27 3
3 3 2019-02-25 2019-02-27 1.5
4 4 2019-01-27 2019-01-28 0.5
5 4 2019-02-14 2019-02-16 1
Solution 2:[2]
The lubridate package's %within% operator and interval function is useful for this kind of thing. You could do something like this:
library(dplyr)
library(lubridate)
df1 %>%
inner_join(df2, by = c("id1" = "id2")) %>%
mutate(across(c(date, start, end), as.Date),
value = as.numeric(value)) %>%
group_by(id1, start, end) %>%
summarise(output_value = sum(
ifelse(
date %within% interval(start, end),
value,
0
)
))
# A tibble: 5 x 4
# Groups: id1, start [5]
id1 start end output_value
<chr> <date> <date> <dbl>
1 1 2019-07-09 2019-07-10 1
2 2 2021-03-22 2021-03-27 3
3 3 2019-02-25 2019-02-27 1.5
4 4 2019-01-27 2019-01-28 0.5
5 4 2019-02-14 2019-02-16 1
Solution 3:[3]
Another option could be to use the data.table package. Below is a quick way to get your desired output. You do need to make sure to convert your data.frames to data.tables.
library(data.table)
df1 <- data.table(id1 = c("1", "2", "3", "4", "4"),
start = c("2019-07-09", "2021-03-22", "2019-02-25", "2019-01-27", "2019-02-14"),
end = c("2019-07-10", "2021-03-27", "2019-02-27", "2019-01-28", "2019-02-16"))
df2 <- data.table(id2 = c( "1","1", "2", "2" ,"2" ,"2" , "2", "2", "3","3", "3", "4", "4", "4" ),
date = c("2019-07-09","2019-07-10", "2021-03-22", "2021-03-23", "2021-03-24", "2021-03-25",
"2021-03-26", "2021-03-27", "2019-02-25", "2019-02-26", "2019-02-27", "2019-01-27", "2019-02-14", "2019-02-15"),
value = c("0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5","0.5"))
# convert value from chr to numeric
df2[,value := as.numeric(value)]
# merge tables by ids
new_dt = merge(df1, df2, by.x = "id1", by.y = "id2", all.y = TRUE)
# sum value when date is between start and end and group by id1,start,end
new_dt2 = new_dt[date %between% .(start, end),.(output_value = sum(value)),
by = .(id1,start,end)]
# result
new_dt2
id1 start end output_value
1: 1 2019-07-09 2019-07-10 1.0
2: 2 2021-03-22 2021-03-27 3.0
3: 3 2019-02-25 2019-02-27 1.5
4: 4 2019-01-27 2019-01-28 0.5
5: 4 2019-02-14 2019-02-16 1.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | benson23 |
| Solution 2 | henryn |
| Solution 3 | Jamie |
