'How to split bigrams into column and row pairs for n-columns
Supposing a dataframe like this:
# example dataset
df <- data.frame(
rowid = 1:3,
a = c("ax","cz","by"),
b = c("cy","ax","bz"),
c = c("bz","ay","cx")
)
What would an efficient approach be to achieving the following transformation?
#> # A tibble: 3 x 4
#> rowid a b c
#> <int> <chr> <chr> <chr>
#> 1 x z y
#> 2 x y z
#> 3 y z x
The goal is to take the second character of each bigram and sort it into columns picked-out by the first character, for each row.
If possible, it would be useful to compare base R and Tidyverse solutions.
Solution 1:[1]
A Tidyverse solution partially inspired by this recent post using rotate_df() from the sjmisc package: https://stackoverflow.com/a/70682560/8068516
df <- data.frame(
rowid = 1:3,
a = c("ax","cz","by"),
b = c("cy","ax","bz"),
c = c("bz","ay","cx")
)
library(sjmisc)
df %>%
# transpose the dataframe keeping column names
rotate_df(cn=TRUE) %>%
# sort columns by first character
mutate(across(everything(),sort)) %>%
# transpose back
rotate_df() %>%
# remove first character from each string
mutate(across(everything(),~str_sub(.,2,-1))) %>%
# make `rowid` column
rownames_to_column(var="rowid")
The dataframe can optionally be turned into tibble with as_tibble() to exactly match the target output, giving:
#> # A tibble: 3 x 4
#> rowid a b c
#> <int> <chr> <chr> <chr>
#> 1 x z y
#> 2 x y z
#> 3 y z x
This solution will generalise to n-columns and is %>% compatible.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | louisdesu |
