'R - Using grouping function inside a loop

I have a set of data with group ids. Inside each group I have to do a calculation for the first observation, and then subsequently and sequentially calculate the remaining n observations in that group. I have the following data:

library(tidyverse)
df <- tibble(id = c(1:10),
       group_id = c(rep(1,5), rep(6,2),rep(8,3)),
       value1 = c(100,200,300,400,500,250,350,20,25,45),
       value2 = c(50,75,150,175,200,15,25,78,99,101)
       )


df <- df %>%
  group_by(group_id) %>%
  mutate(position = 1:n()) # Creating a position id


# A tibble: 6 x 5
# Groups:   group_id [2]
     id group_id value1 value2 position
  <int>    <dbl>  <dbl>  <dbl>    <int>
1     1        1    100     50        1
2     2        1    200     75        2
3     3        1    300    150        3
4     4        1    400    175        4
5     5        1    500    200        5
6     6        6    250     15        1

I would like to create an "aggregation" column, that uses value1, value2, and most importantly, the previous values in itself. I first calculate the first step:

df <- df %>%
  mutate(aggregation = ifelse(position == 1, value1 * value2, 0))

# A tibble: 10 x 6
# Groups:   group_id [3]
      id group_id value1 value2 position aggregation
   <int>    <dbl>  <dbl>  <dbl>    <int>       <dbl>
 1     1        1    100     50        1        5000
 2     2        1    200     75        2           0
 3     3        1    300    150        3           0
 4     4        1    400    175        4           0
 5     5        1    500    200        5           0
 6     6        6    250     15        1        3750
 7     7        6    350     25        2           0
 8     8        8     20     78        1        1560
 9     9        8     25     99        2           0
10    10        8     45    101        3           0

Then I use a loop and set the condition that if the aggregation column has 0 as a value (everything not calculated in the previous step), then I use the value1 * value2 / previous aggregation value:

for (i in 1:nrow(df)) {
  df$aggregation[i] <- ifelse(df$aggregation[i] == 0, round((df$value1[i] * df$value2[i]) / lag(df$aggregation)[i],0), df$aggregation[i])
}

# A tibble: 10 x 6
# Groups:   group_id [3]
      id group_id value1 value2 position aggregation
   <int>    <dbl>  <dbl>  <dbl>    <int>       <dbl>
 1     1        1    100     50        1        5000
 2     2        1    200     75        2           3
 3     3        1    300    150        3       15000
 4     4        1    400    175        4           5
 5     5        1    500    200        5       20000
 6     6        6    250     15        1        3750
 7     7        6    350     25        2           2
 8     8        8     20     78        1        1560
 9     9        8     25     99        2           2
10    10        8     45    101        3        2272

I was wondering if there was a better way to do this. I like to use dplyr, but so far, due to the necessity to calculate the values one after the other, I've been unable to find a good solution.

Most importantly, however, instead of the condition I use in the last portion, I would have liked to to the following:

df %>%
  group_by(group_id) %>%
  mutate(aggregation = case_when(
    group_id != 1 ~ value1 * value2 / lag(aggregation),
    TRUE ~ aggregation
  ))

However, this doesn't work in a loop. I've generally been unable to use dplyr inside a loop, especially since once group_by() is used, I feel like mutate() would be my only option, but it seems to create a conflict with the functionality of the loop itself.



Solution 1:[1]

Perhaps you could use accumulate2 from purrr, which I think may be what you're looking for.

There are 3 arguments needed for accumulate2:

For accumulate2(), a 3-argument function. The function will be passed the accumulated result as the first argument. The next value in sequence from .x is passed as the second argument. The next value in sequence from .y is passed as the third argument.

So, ..1 would be the accumulated result (the previous row aggregation value), ..2 would be value1, and ..3 would be value2.

library(tidyverse)

df %>%
  group_by(group_id) %>%
  mutate(aggregation = accumulate2(value1[-1], value2[-1],
                                   ~ round(..2 * ..3 / ..1),
                                   .init = value1[1] * value2[1]) %>% flatten_dbl)

Output

      id group_id value1 value2 position aggregation
   <int>    <dbl>  <dbl>  <dbl>    <int>       <dbl>
 1     1        1    100     50        1        5000
 2     2        1    200     75        2           3
 3     3        1    300    150        3       15000
 4     4        1    400    175        4           5
 5     5        1    500    200        5       20000
 6     6        6    250     15        1        3750
 7     7        6    350     25        2           2
 8     8        8     20     78        1        1560
 9     9        8     25     99        2           2
10    10        8     45    101        3        2272

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1