'Calculate changes in totals of subgroups in R
I have the following dataframe:
# A tibble: 8 x 5
Year Group Unit Profit Sales
<dbl> <chr> <chr> <dbl> <dbl>
1 2021 One A 20 70
2 2021 One B 10 40
3 2021 One C 5 25
4 2021 Two D 15 50
5 2022 One A NaN 50
6 2022 One B NaN 55
7 2022 One E NaN 150
8 2022 Two D NaN 60
And I want to see the growth in sales per group. The problem is that the groups do not have the same units. So, in order to get an estimate of the growth / decline I want to calculate the growth on only the units that are still present.
So the desired result is something like this:
# A tibble: 8 x 8
Year Group Unit Profit Sales Sales_group_comparison_next_year Sales_group_compariso~ Group_growth_ra~
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021 One A 20 70 110 NaN NaN
2 2021 One B 10 40 110 NaN NaN
3 2021 One C 5 25 110 NaN NaN
4 2021 Two D 15 50 50 NaN NaN
5 2022 One A NaN 50 NaN 105 0.955
6 2022 One B NaN 55 NaN 105 0.955
7 2022 One E NaN 150 NaN 105 0.955
8 2022 Two D NaN 60 NaN 60 1.2
in order to achieve this I did the code below. This works, but it is a lot of code. So my question is, are there smarter / more elegant ways to do this?
test2 <- test %>%
group_by(
Unit
) %>%
mutate(
Years_present = list(min(Year):max(Year))
) %>%
ungroup() %>%
mutate(
Present_next_year = map2(Year, Years_present, ~ .x %in% .y & (.x + 1) %in% .y),
Present_last_year = map2(Year, Years_present, ~ .x %in% .y & (.x - 1) %in% .y),
Present_next_year = ifelse(
Present_next_year == TRUE,
Sales,
0
),
Present_last_year = ifelse(
Present_last_year == TRUE,
Sales,
0
)
) %>%
group_by(
Group, Year
) %>%
mutate(
Sales_group_comparison_next_year = sum(Present_next_year),
Sales_group_comparison_last_year = sum(Present_last_year),
) %>%
ungroup() %>%
group_by(Unit) %>%
mutate(
Sales_group_comparison_lagged =
dplyr::lag(
x = Sales_group_comparison_next_year,
n = 1,
default = NaN,
order_by = Year
),
) %>%
ungroup() %>%
rowwise() %>%
mutate(
Group_growth_rate = Sales_group_comparison_last_year / Sales_group_comparison_lagged
) %>%
ungroup() %>%
group_by(Group, Year) %>%
mutate(Group_growth_rate = ifelse(sum(is.na(Group_growth_rate)) == length(Group_growth_rate),
NaN,
max(Group_growth_rate, na.rm = TRUE))
) %>%
ungroup()
Thanks!
Solution 1:[1]
Here's an approach using some simpler dplyr math and a self-join to pull in last year's totals for each Group.
library(dplyr)
data_with_2023 %>%
group_by(Group, Unit) %>%
mutate(in_next_yr = lead(Year, default = 0) == Year + 1,
in_last_yr = lag(Year, default = 0) == Year - 1) %>%
group_by(Year, Group) %>%
mutate(total_in_next_yr = sum(Sales*in_next_yr),
total_in_prior_yr = sum(Sales*in_last_yr)) %>%
ungroup() -> data_totals
left_join(data_totals,
data_totals %>%
distinct(Year = Year +1,
Group,
last_yr_total = total_in_next_yr)) %>%
mutate(growth = total_in_prior_yr / last_yr_total)
Result
Joining, by = c("Year", "Group")
# A tibble: 10 × 11
Year Group Unit Profit Sales in_next_yr in_last_yr total_in_next_yr total_in_prior_yr last_yr_total growth
<dbl> <chr> <chr> <chr> <int> <lgl> <lgl> <int> <int> <int> <dbl>
1 2021 One A 20 70 TRUE FALSE 110 0 NA NA
2 2021 One B 10 40 TRUE FALSE 110 0 NA NA
3 2021 One C 5 25 FALSE FALSE 110 0 NA NA
4 2021 Two D 15 50 TRUE FALSE 50 0 NA NA
5 2022 One A NaN 50 FALSE TRUE 205 105 110 0.955
6 2022 One B NaN 55 TRUE TRUE 205 105 110 0.955
7 2022 One E NaN 150 TRUE FALSE 205 105 110 0.955
8 2022 Two D NaN 60 FALSE TRUE 0 60 50 1.2
9 2023 One B NaN 70 FALSE TRUE 0 150 205 0.732
10 2023 One E NaN 80 FALSE TRUE 0 150 205 0.732
Source data
data_with_2023 <- data.frame(
stringsAsFactors = FALSE,
Year = c(2021L, 2021L, 2021L, 2021L, 2022L, 2022L, 2022L, 2022L, 2023L, 2023L),
Group = c("One", "One", "One", "Two", "One", "One", "One", "Two", "One", "One"),
Unit = c("A", "B", "C", "D", "A", "B", "E", "D", "B", "E"),
Profit = c("20", "10", "5", "15", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN"),
Sales = c(70L, 40L, 25L, 50L, 50L, 55L, 150L, 60L, 70L, 80L)
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
