'cumsum using ddply
I need to use group by in levels with ddply or aggregate if that's easier. I am not really sure how to do this as I need to use cumsum as my aggregate function. This is what my data looks like:
level1 level2 hour product
A tea 0 7
A tea 1 2
A tea 2 9
A coffee 17 7
A coffee 18 2
A coffee 20 4
B coffee 0 2
B coffee 1 3
B coffee 2 4
B tea 21 3
B tea 22 1
expected output:
A tea 0 7
A tea 1 9
A tea 2 18
A coffee 17 7
A coffee 18 9
A coffee 20 13
B coffee 0 2
B coffee 1 5
B coffee 2 9
B tea 21 3
B tea 22 4
I tried using
ddply(dd,c("level1","level2","hour"),summarise,cumsum(product))
but that doesn't sum up which I think is because the hour column is being used for group by and its being split by that..I think.. I am not sure I completely understand how aggregate works here. Is there any way I could get the required output using aggregate or ddply?
Solution 1:[1]
you should use transform instead of summarise:
# you should probably order your `level2` first
dd$level2 <- factor(dd$level2, levels=c("tea", "coffee"))
# and transform using level1 and level2 alone, not hour
# if you use hour, the groups will be for each row
ddply(dd, .(level1, level2), transform, product=cumsum(product))
# level1 level2 hour product
# 1 A tea 0 7
# 2 A tea 1 9
# 3 A tea 2 18
# 4 A coffee 17 7
# 5 A coffee 18 9
# 6 A coffee 20 13
# 7 B tea 21 3
# 8 B tea 22 4
# 9 B coffee 0 2
# 10 B coffee 1 5
# 11 B coffee 2 9
Solution 2:[2]
Folks referencing this now may wish to avoid using plyr::ddply. Here is a dplyr solution.
library(dplyr)
output <- dd %>%
group_by(level1, level2) %>%
mutate(product_sum = cumsum(product)) %>%
ungroup()
output
Note that product_sum is where the new column is named. You could use product instead. Then, the original product column would be overwritten as it appears the original question may have preferred to do that and not have both columns in the output.
Data:
level1 <- c(rep("A",6), rep("B",5))
level2 <- c(rep("tea",3), rep("coffee",6), rep("tea",2))
hour <- c(0,1,2,17,18,20,0,1,2,21,22)
product <- c(7,2,9,7,2,4,2,3,4,3,1)
dd <- data.frame(level1, level2, hour, product)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Tanner33 |
