'Apply cumsum but with conditions
I'm looking for a way to cumsum a column (by group) but with specific conditions.
My conditions are :
- Add a starting value
- The sum is restricted by upper & lower limits
I've done it with an iterative solution but wonder if a more elegant solution could be found :
# Init my data.frame with random values
# Knowing my cumsum on "value" must be applied on the "group" column using the "ordering" column
random_values <- c(-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
values_probs <- c(0.7, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03)
x <- data.frame(
group = rep(c("A", "B"), each = 10),
ordering = rep(1:10, 2),
value = sample(x = random_values, replace = T, prob = values_probs, size = 20)
)
# Create the "previous_conditional_sum" column
# Init the first row by group with the default starting value
starting_value <- 3
x <- x %>%
group_by(group) %>%
mutate(conditional_sum = NA_real_) %>%
mutate(previous_conditional_sum = case_when(
ordering == min(ordering) ~ starting_value,
T ~ NA_real_
)) %>%
as.data.frame()
# Here my "custom" sum function applying lower & upper limits
custom_sum <- function(a,b, lower_limit = 3, upper_limit = 15) {
x <- a+b
x <- min(x, upper_limit)
x <- max(x, lower_limit)
return(x)
}
# Iteratively sum using my custom function
for (i in 1:nrow(x)) {
# Init the new value
new_value <- custom_sum(x[i,"previous_conditional_sum",drop=T], x[i,"value",drop=T])
x[i,"conditional_sum"] <- new_value
# Set the previous_conditional_sum on next line if revelant
if (is.na(x[i+1, "previous_conditional_sum",drop=T]) & i+1<=nrow(x)) {
x[i+1, "previous_conditional_sum"] <- new_value
}
}
Solution 1:[1]
I wrote a helper function that I think captures what you want to do, in a 'vectorized' way (x is a vector, rather than an element of a vector)
f <- function(x, start, min, max) {
x = start + cumsum(x)
pmin(pmax(x, min), max)
}
It is easily tested
set.seed(123)
x <- runif(10)
x
f(x, 0, 2, 4)
and using dplyr easy to apply to grouped data (using the value of x provided in the answer by @zephryl)
x |> group_by(group) |> mutate(condsum = f(value, 3, 3, 15))
The result differs from @zephryl, because in their answer the 'clamp' (restricting the range of the returned value) is applied in each iteration, and the clamped value used for the cumulative sum; in the implementation here the cumulative sum is calculated first, and then the values clamped to the specific range.
It is not clear which result matches what you want...
I was surprised that the answers were different. I thought accumulate() would just be an iterative way of calculating a vectorized cumsum(), but it is not -- this is a subtle difference, and I wonder if it is widely appreciated?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Martin Morgan |
