'Creating a function with lag variables from same column in a list in R

I have a set of dataframes in a list and have to create an extra column for each dataframe (which I´ve done) and then create a formula for the first row, and a different one from the second row onwards taking lags from the same column:

Let say the list name is "CCNRRF_list"

Creation of the fourth column (X4)

CNRRF_list<- mapply(cbind, CNRRF_list, "X4"=NA,SIMPLIFY=F)

one of the resulting dataframes

        x1  x2  x3  x4   
    1   1   1   1   NA
    2   2   2   2   NA
    3   3   3   3   NA
    4   4   4   4   NA 
    5   5   5   5   NA
    6   6   6   6   NA
    7   7   7   7   NA
    8   8   8   8   NA

First formula first row

for (i in seq_along(CNRRF_list)) {
CNRRF_list[[i]]$X4[1]<-(1+CNRRF_list[[i]]$X3[1])
}

Resulting data

        x1  x2  x3  x4   
    1   1   1   1   2  ===> "formula (1+X3)=(1+1)=2"
    2   2   2   2   NA
    3   3   3   3   NA
    4   4   4   4   NA 
    5   5   5   5   NA
    6   6   6   6   NA
    7   7   7   7   NA
    8   8   8   8   NA

now it gets tricky, from the second row onwards the formula is:

lag(X4)*(1+X3)

so the resulting data should look like this for each dataframe in the list:

        x1  x2  x3  x4   
    1   1   1   1   2  
    2   2   2   2   6 ===> "formula lag(X4)*(1+x3)=2*(1+2)=6"
    3   3   3   3   24 ===> "formula 6*(1+3)"
    4   4   4   4   120 ===> "formula 24*(1+4)"
    5   5   5   5   720 ===> "formula 120*(1+5)"
    6   6   6   6   5040 ===> "formula 720*(1+6)"
    7   7   7   7   40320 ===> "formula 5040*(1+7)"
    8   8   8   8   362880 ===> "formula 40320*(1+8)"

But I haven´t been able to create a good enough formula.

some of my attempts

for (i in seq_along(CNRRF_list)) {
CNRRF_list[[i]] <- mutate(CNRRF_list[[i]], X4 = (ifelse(is.na(CNRRF_list[[i]]$X4),lag(CNRRF_list[[i]]$X4)*(1+CNRRF_list[[i]]$X3), 1*(1+CNRRF_list[[i]]$X3))))
}

Not working...any help will be appreciate.

Thanks



Solution 1:[1]

How about this:

  dat <- tibble::tribble(
  ~x1,  ~x2,  ~x3, ~x4,   
1,   1,   1,   NA,
2,   2,   2,   NA,
3,   3,   3,   NA,
4,   4,   4,   NA, 
5,   5,   5,   NA,
6,   6,   6,   NA,
7,   7,   7,   NA,
8,   8,   8,   NA)

for(i in 1:nrow(dat)){
  dat$x4[i] <- prod(c(NA, lag(dat$x4))[i], (1+dat$x3[i]), na.rm=TRUE)
}

dat
#> # A tibble: 8 × 4
#>      x1    x2    x3     x4
#>   <dbl> <dbl> <dbl>  <dbl>
#> 1     1     1     1      2
#> 2     2     2     2      6
#> 3     3     3     3     24
#> 4     4     4     4    120
#> 5     5     5     5    720
#> 6     6     6     6   5040
#> 7     7     7     7  40320
#> 8     8     8     8 362880

Created on 2022-04-05 by the reprex package (v2.0.1)


Edit: Apply to a list of data frames

Here's how you could apply this to a list of data frames.

  dat <- tibble::tribble(
  ~x1,  ~x2,  ~x3, ~x4,   
1,   1,   1,   NA,
2,   2,   2,   NA,
3,   3,   3,   NA,
4,   4,   4,   NA, 
5,   5,   5,   NA,
6,   6,   6,   NA,
7,   7,   7,   NA,
8,   8,   8,   NA)

  
dat_list <- list(dat, dat, dat)  
res <- lapply(dat_list, function(x){  
for(i in 1:nrow(x)){
  x$x4[i] <- prod(c(NA, lag(x$x4))[i], (1+x$x3[i]), na.rm=TRUE)
}
x
})

res
#> [[1]]
#> # A tibble: 8 × 4
#>      x1    x2    x3     x4
#>   <dbl> <dbl> <dbl>  <dbl>
#> 1     1     1     1      2
#> 2     2     2     2      6
#> 3     3     3     3     24
#> 4     4     4     4    120
#> 5     5     5     5    720
#> 6     6     6     6   5040
#> 7     7     7     7  40320
#> 8     8     8     8 362880
#> 
#> [[2]]
#> # A tibble: 8 × 4
#>      x1    x2    x3     x4
#>   <dbl> <dbl> <dbl>  <dbl>
#> 1     1     1     1      2
#> 2     2     2     2      6
#> 3     3     3     3     24
#> 4     4     4     4    120
#> 5     5     5     5    720
#> 6     6     6     6   5040
#> 7     7     7     7  40320
#> 8     8     8     8 362880
#> 
#> [[3]]
#> # A tibble: 8 × 4
#>      x1    x2    x3     x4
#>   <dbl> <dbl> <dbl>  <dbl>
#> 1     1     1     1      2
#> 2     2     2     2      6
#> 3     3     3     3     24
#> 4     4     4     4    120
#> 5     5     5     5    720
#> 6     6     6     6   5040
#> 7     7     7     7  40320
#> 8     8     8     8 362880

Created on 2022-04-05 by the reprex package (v2.0.1)

Solution 2:[2]

Another option is to use the base function Reduce

using data.table

library(data.table)
setDT(dt) # make it a data.table if it is not already

dt[, x4 := 2 * Reduce(f = function(a, b) { a * (b + 1) }, accumulate = T, x = x3)]

using dplyr

dt %>% 
  mutate(x4 = 2 * Reduce(f = function(a, b) { a * (b + 1) }, accumulate = T, x = x3))

output

#    x1 x2 x3     x4
# 1:  1  1  1      2
# 2:  2  2  2      6
# 3:  3  3  3     24
# 4:  4  4  4    120
# 5:  5  5  5    720
# 6:  6  6  6   5040
# 7:  7  7  7  40320
# 8:  8  8  8 362880

data

dt <- data.frame(x1 = seq(1:8), x2 = seq(1:8), x3 = seq(1:8))

Solution 3:[3]

With accumulate:

library(tidyverse)
dat %>% 
  mutate(x4 = accumulate(seq(nrow(.) + 1), ~ .y * .x)[-1])

# A tibble: 8 x 4
     x1    x2    x3     x4
  <dbl> <dbl> <dbl>  <int>
1     1     1     1      2
2     2     2     2      6
3     3     3     3     24
4     4     4     4    120
5     5     5     5    720
6     6     6     6   5040
7     7     7     7  40320
8     8     8     8 362880

For multiple dataframes:

list <- list(dat, dat, dat)  
dat_list %>% 
  map(~ .x %>% 
        mutate(x4 = purrr::accumulate(seq(nrow(.) + 1), ~ .y * .x)[-1])
  )

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Merijn van Tilborg
Solution 3