'R dplyr - Same column, getting the sum of the two following rows of a dataframe

I have the following table:

date <- c("2021-12-30", "2021-12-30", "2021-12-30", "2022-01-06", "2022-01-06", "2022-01-06", "2022-01-13", "2022-01-13", "2022-01-13")
age_group <- c("ALL", "<60", "60+", "ALL", "<60", "60+", "ALL", "<60", "60+")
number <- c(583, 2207, 56, 662, 2852, 71, 432, 1243, 60)
number2 <- c(852, 900, 50, 1040, 1103, 76, 456, 461, 29)

data.frame(date, age_group, number, number2)
        date age_group number number2
1 2021-12-30       ALL    583     852
2 2021-12-30       <60   2207     900
3 2021-12-30       60+     56      50
4 2022-01-06       ALL    662    1040
5 2022-01-06       <60   2852    1103
6 2022-01-06       60+     71      76
7 2022-01-13       ALL    432     456
8 2022-01-13       <60   1243     461
9 2022-01-13       60+     60      29

As you can see, the values of "<60" and "60+" do not correspond to the value of "ALL" (i.e., for "2021-12-30" and the "number" column -> 2207 (from age group "<60") + 56 (from age group "60+") != 583 (from age group "ALL").

I want to modify the dataframe so that the values from "ALL" are equal to the sum of "<60" and "60+" for each date and each column, i.e.:

        date age_group number number2
1 2021-12-30       ALL   2263     950
2 2021-12-30       <60   2207     900
3 2021-12-30       60+     56      50
4 2022-01-06       ALL   2923    1179
5 2022-01-06       <60   2852    1103
6 2022-01-06       60+     71      76
7 2022-01-13       ALL   1303     480
8 2022-01-13       <60   1243     461
9 2022-01-13       60+     60      29

Is there a straightforward solution using dplyr or does this require multiple pivot_longer and pivot_wider back and forth?



Solution 1:[1]

A good way to do this is to transpose the dataframe by pivoting, so that the 3 age groupings are separate columns instead of rows:

df2 <- df %>%
    group_by(date) %>%
    pivot_longer(-c('date', 'age_group')) %>%
    pivot_wider(names_from = 'age_group') %>%
    mutate(ALL = `<60` + `60+`)

df2
  date       name      ALL `<60` `60+`
  <chr>      <chr>   <dbl> <dbl> <dbl>
1 2021-12-30 number   2263  2207    56
2 2021-12-30 number2   950   900    50
3 2022-01-06 number   2923  2852    71
4 2022-01-06 number2  1179  1103    76
5 2022-01-13 number   1303  1243    60
6 2022-01-13 number2   490   461    29

Note the required backticks (`) around the variable names in mutate. They are needed because variable names cannot otherwise start with a number or an illegal character (here the less than operator <). The backtick tells R that these are variable names, despite what they look like.

Then we can just transpose it back if you'd prefer:

df2 %>%
    pivot_longer(-c('date', 'name'), names_to = 'age_group') %>%
    pivot_wider()

  date       age_group number number2
  <chr>      <chr>      <dbl>   <dbl>
1 2021-12-30 ALL         2263     950
2 2021-12-30 <60         2207     900
3 2021-12-30 60+           56      50
4 2022-01-06 ALL         2923    1179
5 2022-01-06 <60         2852    1103
6 2022-01-06 60+           71      76
7 2022-01-13 ALL         1303     490
8 2022-01-13 <60         1243     461
9 2022-01-13 60+           60      29

Solution 2:[2]

Here is another tidyverse option, where you do not have to pivot the data. Here, I filter out the ALL rows, then get the sum for each number, which is grouped by date. Then, I bind ALL back to the original dataframe (minus the ALL rows).

library(tidyverse)

df %>%
  filter(age_group != "ALL") %>%
  group_by(date) %>%
  summarise(age_group = "ALL", across(c(number, number2), sum)) %>%
  bind_rows(., df %>% filter(age_group != "ALL")) %>%
  arrange(date)

Output

  date       age_group number number2
  <chr>      <chr>      <dbl>   <dbl>
1 2021-12-30 ALL         2263     950
2 2021-12-30 <60         2207     900
3 2021-12-30 60+           56      50
4 2022-01-06 ALL         2923    1179
5 2022-01-06 <60         2852    1103
6 2022-01-06 60+           71      76
7 2022-01-13 ALL         1303     490
8 2022-01-13 <60         1243     461
9 2022-01-13 60+           60      29

Data

df <-
  structure(
    list(
      date = c(
        "2021-12-30", "2021-12-30", "2021-12-30", "2022-01-06", "2022-01-06",
        "2022-01-06", "2022-01-13", "2022-01-13", "2022-01-13"
      ),
      age_group = c("ALL", "<60", "60+", "ALL", "<60",
                    "60+", "ALL", "<60", "60+"),
      number = c(583, 2207, 56, 662, 2852,
                 71, 432, 1243, 60),
      number2 = c(852, 900, 50, 1040, 1103, 76,
                  456, 461, 29)
    ),
    class = "data.frame",
    row.names = c(NA,-9L)
  )

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2