'Addition corresponding to all possible combination of two columns based on a criteria
I have the following input
| ID | Column1 | Column2 | Value |
|:-- |:-------:|:-------:| -----:|
| A | x | a | 4 |
| A | x | b | 1 |
| A | y | a | 5 |
| A | y | b | 6 |
| B | u | a | 9 |
| B | u | b | 2 |
| B | v | a | 7 |
| B | v | b | 4 |
| B | w | a | 1 |
| B | w | b | 3 |
I am looking to get the following output where values in output would be addition of possible combinations of column 1 and column 2 in input for every ID, similar to below
| ID | Column1 | Value |
|:-- |:--------------:| -----------:|
| A | xa+ya | 4+5 |
| A | xa+yb | 4+6 |
| A | xb+ya | 1+5 |
| A | xb+yb | 1+6 |
| B | ua+va+wa | 9+7+1 |
| B | ua+va+wb | 9+7+3 |
| B | ua+vb+wa | 9+4+1 |
| B | ua+vb+wb | 9+4+3 |
| B | ub+va+wa | 2+7+1 |
| B | ub+va+wb | 2+7+3 |
| B | ub+vb+wa | 2+4+1 |
| B | ub+vb+wb | 2+4+3 |
Solution 1:[1]
The logic for combinations is not very clear. Maybe this helps - paste (str_c from stringr), the 'Column1', 'Column2' to a single column, then grouped by 'ID', summarise the 'ColumnA' with combnations to paste the elements, while getting the sum from Value column, then filter out the combinations that are not needed based on the unique values in 'Column1'
library(dplyr)
library(stringr)
df1 %>%
mutate(ColumnA = str_c(Column1, Column2)) %>%
group_by(ID) %>%
summarise(ColumnA = combn(ColumnA, n_distinct(Column1),
FUN = str_c, collapse = "+"),
Value = combn(Value, n_distinct(Column1), FUN = sum), .groups = 'drop') %>%
mutate(new1 = str_remove_all(ColumnA, ".(?=(\\+|$))|\\+"),
new2 = str_replace_all(new1, "(.)\\1+", "\\1")) %>%
filter(nchar(new2) == nchar(new1)) %>%
select(ID, Column1 = ColumnA, Value)
-output
# A tibble: 12 × 3
ID Column1 Value
<chr> <chr> <int>
1 A xa+ya 9
2 A xa+yb 10
3 A xb+ya 6
4 A xb+yb 7
5 B ua+va+wa 17
6 B ua+va+wb 19
7 B ua+vb+wa 14
8 B ua+vb+wb 16
9 B ub+va+wa 10
10 B ub+va+wb 12
11 B ub+vb+wa 7
12 B ub+vb+wb 9
Or another option is
library(tidyr)
df1 %>%
mutate(ColumnA = str_c(Column1, Column2)) %>%
group_by(ID) %>%
group_modify(~ bind_cols(
split(.x$ColumnA, .x$Column1) %>%
expand.grid %>%
unite(ColumnA, everything(), sep = "+"),
Value = split(.x$Value, .x$Column1) %>%
expand.grid %>%
rowSums(.)) ) %>%
ungroup
-output
# A tibble: 12 × 3
ID ColumnA Value
<chr> <chr> <dbl>
1 A xa+ya 9
2 A xb+ya 6
3 A xa+yb 10
4 A xb+yb 7
5 B ua+va+wa 17
6 B ub+va+wa 10
7 B ua+vb+wa 14
8 B ub+vb+wa 7
9 B ua+va+wb 19
10 B ub+va+wb 12
11 B ua+vb+wb 16
12 B ub+vb+wb 9
data
df1 <- structure(list(ID = c("A", "A", "A", "A", "B", "B", "B", "B",
"B", "B"), Column1 = c("x", "x", "y", "y", "u", "u", "v", "v",
"w", "w"), Column2 = c("a", "b", "a", "b", "a", "b", "a", "b",
"a", "b"), Value = c(4L, 1L, 5L, 6L, 9L, 2L, 7L, 4L, 1L, 3L)),
row.names = c(NA,
-10L), class = "data.frame")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
