'Using row index number to calculate values
I'm having trouble using the row number as index. For example I want a new column that will give me the sales taking into account the next 4 days. I want to create column name:sale_next 4
The issue with my code is that I don't know how to make the index of the row_number() variable, since what I'm doing is fetching the actual value of the column.
#heres to create the data
df <- read.table(text = "day price price_change sales High_sales_ind
1 5 0 12 1
2 5 0 6 0
3 5 0 5 0
4 5 0 4 0
5 5 0 10 1
6 5 0 10 1
7 5 0 10 1
8 5 0 12 1
9 5 0 14 1
10 7 2 3 0
11 7 0 2 0", header = TRUE)
#my code
df<- df %>% mutate(sales_next4 = sales[row_number():sales_rownumber()+4)
What I need:
| day | price | price_change | sales | High_sales_ind | sales_next4 |
|---|---|---|---|---|---|
| 1 | 5 | 0 | 12 | 1 | 27 |
| 2 | 5 | 0 | 6 | 0 | 25 |
| 3 | 5 | 0 | 5 | 0 | 29 |
| 4 | 5 | 0 | 4 | 0 | 34 |
| 5 | 5 | 0 | 10 | 1 | 42 |
| 6 | 5 | 0 | 10 | 1 | 46 |
| 7 | 5 | 0 | 10 | 1 | 39 |
| 8 | 5 | 0 | 12 | 1 | 31 |
| 9 | 5 | 0 | 14 | 1 | 19 |
| 10 | 7 | 2 | 3 | 0 | 5 |
| 11 | 7 | 0 | 2 | 0 | 2 |
Any help would be appreciated.
Solution 1:[1]
You can use rollapply from the zoo package for cases like this, assuming that the days are consecutive as in the example data provided.
You'll need to use the partial = and align = arguments to fill the column correctly, see ?rollapply for the details.
library(dplyr)
library(zoo)
df <- df %>%
mutate(sales_next4 = rollapply(sales, 4, sum, partial = TRUE, align = "left"))
Result:
day price price_change sales High_sales_ind sales_next4
1 1 5 0 12 1 27
2 2 5 0 6 0 25
3 3 5 0 5 0 29
4 4 5 0 4 0 34
5 5 5 0 10 1 42
6 6 5 0 10 1 46
7 7 5 0 10 1 39
8 8 5 0 12 1 31
9 9 5 0 14 1 19
10 10 7 2 3 0 5
11 11 7 0 2 0 2
Solution 2:[2]
You can use map() from purrr to do rolling sum depending on the day column.
library(dplyr)
library(purrr)
df %>%
mutate(sales_next4 = map_dbl(day, ~ sum(sales[between(day, .x, .x+3)])))
# day price price_change sales High_sales_ind sales_next4
# 1 1 5 0 12 1 27
# 2 2 5 0 6 0 25
# 3 3 5 0 5 0 29
# 4 4 5 0 4 0 34
# 5 5 5 0 10 1 42
# 6 6 5 0 10 1 46
# 7 7 5 0 10 1 39
# 8 8 5 0 12 1 31
# 9 9 5 0 14 1 19
# 10 10 7 2 3 0 5
# 11 11 7 0 2 0 2
Solution 3:[3]
Using slider
library(dplyr)
library(slider)
df %>%
mutate(sales_next4 = slide_dbl(day, ~ sum(sales[.x]), .after = 3))
day price price_change sales High_sales_ind sales_next4
1 1 5 0 12 1 27
2 2 5 0 6 0 25
3 3 5 0 5 0 29
4 4 5 0 4 0 34
5 5 5 0 10 1 42
6 6 5 0 10 1 46
7 7 5 0 10 1 39
8 8 5 0 12 1 31
9 9 5 0 14 1 19
10 10 7 2 3 0 5
11 11 7 0 2 0 2
Solution 4:[4]
You can use Reduce() and data.table::shift()
library(data.table)
setDT(df)[, n4:=Reduce(`+`,shift(c(sales,0,0,0),-3:0))[1:.N]]
Output:
day price price_change sales High_sales_ind sales_next4
1 1 5 0 12 1 27
2 2 5 0 6 0 25
3 3 5 0 5 0 29
4 4 5 0 4 0 34
5 5 5 0 10 1 42
6 6 5 0 10 1 46
7 7 5 0 10 1 39
8 8 5 0 12 1 31
9 9 5 0 14 1 19
10 10 7 2 3 0 5
11 11 7 0 2 0 2
or, could this as part of dplyr/mutate pipeline
mutate(df, sales_next4 = Reduce(`+`, data.table::shift(c(sales,0,0,0),0:-3))[1:nrow(df)])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | akrun |
| Solution 4 |
