'Apply cumsum but with conditions

I'm looking for a way to cumsum a column (by group) but with specific conditions. My conditions are :

  • Add a starting value
  • The sum is restricted by upper & lower limits

I've done it with an iterative solution but wonder if a more elegant solution could be found :

# Init my data.frame with random values
# Knowing my cumsum on "value" must be applied on the "group" column using the "ordering" column
random_values <- c(-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
values_probs <- c(0.7, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03)
x <- data.frame(
  group = rep(c("A", "B"), each = 10),
  ordering = rep(1:10, 2),
  value = sample(x = random_values, replace = T, prob = values_probs, size = 20)
)

# Create the "previous_conditional_sum" column
# Init the first row by group with the default starting value
starting_value <- 3
x <- x %>%
  group_by(group) %>%
  mutate(conditional_sum = NA_real_) %>%
  mutate(previous_conditional_sum = case_when(
    ordering == min(ordering) ~ starting_value,
    T ~ NA_real_
  )) %>%
  as.data.frame()

# Here my "custom" sum function applying lower & upper limits
custom_sum <- function(a,b, lower_limit = 3, upper_limit = 15) {
  x <- a+b
  x <- min(x, upper_limit)
  x <- max(x, lower_limit)
  return(x)
}

# Iteratively sum using my custom function
for (i in 1:nrow(x)) {
  
  # Init the new value
  new_value <- custom_sum(x[i,"previous_conditional_sum",drop=T], x[i,"value",drop=T])
  x[i,"conditional_sum"] <- new_value
  
  # Set the previous_conditional_sum on next line if revelant
  if (is.na(x[i+1, "previous_conditional_sum",drop=T]) & i+1<=nrow(x)) {
    x[i+1, "previous_conditional_sum"] <- new_value
  }

}
r


Solution 1:[1]

I wrote a helper function that I think captures what you want to do, in a 'vectorized' way (x is a vector, rather than an element of a vector)

f <- function(x, start, min, max) { 
    x = start + cumsum(x)
    pmin(pmax(x, min), max)
}

It is easily tested

set.seed(123)
x <- runif(10)
x
f(x, 0, 2, 4)

and using dplyr easy to apply to grouped data (using the value of x provided in the answer by @zephryl)

x |> group_by(group) |> mutate(condsum = f(value, 3, 3, 15))

The result differs from @zephryl, because in their answer the 'clamp' (restricting the range of the returned value) is applied in each iteration, and the clamped value used for the cumulative sum; in the implementation here the cumulative sum is calculated first, and then the values clamped to the specific range.

It is not clear which result matches what you want...

I was surprised that the answers were different. I thought accumulate() would just be an iterative way of calculating a vectorized cumsum(), but it is not -- this is a subtle difference, and I wonder if it is widely appreciated?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Martin Morgan