'Calculate changes in totals of subgroups in R

I have the following dataframe:

# A tibble: 8 x 5
   Year Group Unit  Profit Sales
  <dbl> <chr> <chr>  <dbl> <dbl>
1  2021 One   A         20    70
2  2021 One   B         10    40
3  2021 One   C          5    25
4  2021 Two   D         15    50
5  2022 One   A        NaN    50
6  2022 One   B        NaN    55
7  2022 One   E        NaN   150
8  2022 Two   D        NaN    60

And I want to see the growth in sales per group. The problem is that the groups do not have the same units. So, in order to get an estimate of the growth / decline I want to calculate the growth on only the units that are still present.

So the desired result is something like this:

# A tibble: 8 x 8
   Year Group Unit  Profit Sales Sales_group_comparison_next_year Sales_group_compariso~ Group_growth_ra~
  <dbl> <chr> <chr>  <dbl> <dbl>                            <dbl>                  <dbl>            <dbl>
1  2021 One   A         20    70                              110                    NaN          NaN    
2  2021 One   B         10    40                              110                    NaN          NaN    
3  2021 One   C          5    25                              110                    NaN          NaN    
4  2021 Two   D         15    50                               50                    NaN          NaN    
5  2022 One   A        NaN    50                              NaN                    105            0.955
6  2022 One   B        NaN    55                              NaN                    105            0.955
7  2022 One   E        NaN   150                              NaN                    105            0.955
8  2022 Two   D        NaN    60                              NaN                     60            1.2  

in order to achieve this I did the code below. This works, but it is a lot of code. So my question is, are there smarter / more elegant ways to do this?

test2 <- test %>%
    group_by(
        Unit
    ) %>%
    mutate(
        Years_present = list(min(Year):max(Year))
    ) %>%
    ungroup() %>%
    mutate(
        Present_next_year = map2(Year, Years_present, ~ .x %in% .y & (.x + 1) %in% .y),
        Present_last_year = map2(Year, Years_present, ~ .x %in% .y & (.x - 1) %in% .y),
        Present_next_year = ifelse(
            Present_next_year == TRUE,
            Sales,
            0
        ),
        Present_last_year = ifelse(
            Present_last_year == TRUE,
            Sales,
            0
        )
    ) %>%
    group_by(
        Group, Year
    ) %>%
    mutate(
        Sales_group_comparison_next_year = sum(Present_next_year),
        Sales_group_comparison_last_year = sum(Present_last_year),
    ) %>%
    ungroup() %>%
    group_by(Unit) %>%
    mutate(
        Sales_group_comparison_lagged =
        dplyr::lag(
            x = Sales_group_comparison_next_year,
            n = 1,
            default = NaN,
            order_by = Year
        ),
    ) %>%
    ungroup() %>%
    rowwise() %>%
    mutate(
        Group_growth_rate = Sales_group_comparison_last_year / Sales_group_comparison_lagged
    ) %>%
    ungroup() %>%
    group_by(Group, Year) %>%
    mutate(Group_growth_rate = ifelse(sum(is.na(Group_growth_rate)) == length(Group_growth_rate),
                                      NaN,
                                      max(Group_growth_rate, na.rm = TRUE))
           ) %>%
    ungroup()

Thanks!



Solution 1:[1]

Here's an approach using some simpler dplyr math and a self-join to pull in last year's totals for each Group.

library(dplyr)
data_with_2023 %>%
  group_by(Group, Unit) %>%
  mutate(in_next_yr = lead(Year, default = 0) == Year + 1,
         in_last_yr = lag(Year, default = 0) == Year - 1) %>%
  group_by(Year, Group) %>%
  mutate(total_in_next_yr = sum(Sales*in_next_yr),
         total_in_prior_yr = sum(Sales*in_last_yr)) %>%
  ungroup() -> data_totals

left_join(data_totals,
          data_totals %>% 
            distinct(Year = Year +1, 
                     Group, 
                     last_yr_total = total_in_next_yr)) %>%
  mutate(growth = total_in_prior_yr / last_yr_total)

Result

Joining, by = c("Year", "Group")
# A tibble: 10 × 11
    Year Group Unit  Profit Sales in_next_yr in_last_yr total_in_next_yr total_in_prior_yr last_yr_total growth
   <dbl> <chr> <chr> <chr>  <int> <lgl>      <lgl>                 <int>             <int>         <int>  <dbl>
 1  2021 One   A     20        70 TRUE       FALSE                   110                 0            NA NA    
 2  2021 One   B     10        40 TRUE       FALSE                   110                 0            NA NA    
 3  2021 One   C     5         25 FALSE      FALSE                   110                 0            NA NA    
 4  2021 Two   D     15        50 TRUE       FALSE                    50                 0            NA NA    
 5  2022 One   A     NaN       50 FALSE      TRUE                    205               105           110  0.955
 6  2022 One   B     NaN       55 TRUE       TRUE                    205               105           110  0.955
 7  2022 One   E     NaN      150 TRUE       FALSE                   205               105           110  0.955
 8  2022 Two   D     NaN       60 FALSE      TRUE                      0                60            50  1.2  
 9  2023 One   B     NaN       70 FALSE      TRUE                      0               150           205  0.732
10  2023 One   E     NaN       80 FALSE      TRUE                      0               150           205  0.732

Source data

data_with_2023 <- data.frame(
  stringsAsFactors = FALSE,
              Year = c(2021L, 2021L, 2021L, 2021L, 2022L, 2022L, 2022L, 2022L, 2023L, 2023L),
             Group = c("One", "One", "One", "Two", "One", "One", "One", "Two", "One", "One"),
              Unit = c("A", "B", "C", "D", "A", "B", "E", "D", "B", "E"),
            Profit = c("20", "10", "5", "15", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN"),
             Sales = c(70L, 40L, 25L, 50L, 50L, 55L, 150L, 60L, 70L, 80L)
)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1