'Identify matched pairs and create a common key
I have
df<-data.frame(record_id=c("A", "B", "C", "D", "E", "F"), var1=1:6, matched.with=rev(c("A", "B", "C", "D", "E", "F")))
> df
record_id var1 matched.with
1 A 1 F
2 B 2 E
3 C 3 D
4 D 4 C
5 E 5 B
6 F 6 A
so, id A has been matched with F, B with E, C with D
I would like to create a common.key variable for the pairs like
df.common.key
record_id var1 matched.with common.key
1 A 1 F 1
2 B 2 E 2
3 C 3 D 3
4 D 4 C 3
5 E 5 B 2
6 F 6 A 1
Which shows that A has been matched with F (row 1) with key 1 and F has been matched with A (row 6) with also key 1. Common key does not have to be numeric, it can also be string or factor.
How can I accomplish this 1) if my data frame includes only matched pairs, 2) if my DF also includes observations without matched pairs 3) is there a tidyverse solution?
Solution 1:[1]
We can get the unique combination of letters (in the same order) using pmin and pmax. Then, we can ust the grp column to create the common.key.
library(tidyverse)
df %>%
group_by(grp = paste0(pmin(record_id, matched.with), pmax(record_id, matched.with))) %>%
mutate(common.key = cur_group_id()) %>%
select(-grp)
Output
grp record_id var1 matched.with common.key
<chr> <chr> <int> <chr> <int>
1 AF A 1 F 1
2 BE B 2 E 2
3 CD C 3 D 3
4 CD D 4 C 3
5 BE E 5 B 2
6 AF F 6 A 1
Solution 2:[2]
You could do it with a temporary variable that pastes the strings from record_id and matched.with in alphabetical order (row-wise), then groups by that variable and selects the first var1 to put in the common.key
df %>%
rowwise() %>%
mutate(common = paste0(sort(c(record_id, matched.with)), collapse = '')) %>%
group_by(common) %>%
mutate(common.key = first(var1)) %>%
ungroup() %>%
select(-common)
#> # A tibble: 6 x 4
#> record_id var1 matched.with common.key
#> <chr> <int> <chr> <int>
#> 1 A 1 F 1
#> 2 B 2 E 2
#> 3 C 3 D 3
#> 4 D 4 C 3
#> 5 E 5 B 2
#> 6 F 6 A 1
Solution 3:[3]
We may also use
library(dplyr)
library(stringr)
df %>%
group_by(grp = str_c(pmin(record_id, matched.with),
pmax(record_id, matched.with))) %>%
mutate(common.key = first(var1)) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 6 × 4
record_id var1 matched.with common.key
<chr> <int> <chr> <int>
1 A 1 F 1
2 B 2 E 2
3 C 3 D 3
4 D 4 C 3
5 E 5 B 2
6 F 6 A 1
Solution 4:[4]
With a join:
library(tidyverse)
df %>%
mutate(id = pmin(matched.with, record_id)) %>%
left_join(select(df, record_id, var1), by = c("id" = "record_id")) %>%
select(record_id, var1 = var1.x, matched.with, common.key = var1.y)
# A tibble: 6 x 4
record_id var1 matched.with common.key
<chr> <int> <chr> <int>
1 A 1 F 1
2 B 2 E 2
3 C 3 D 3
4 D 4 C 3
5 E 5 B 2
6 F 6 A 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | AndrewGB |
| Solution 2 | Allan Cameron |
| Solution 3 | akrun |
| Solution 4 |
