'How to split bigrams into column and row pairs for n-columns

Supposing a dataframe like this:

# example dataset
df <- data.frame(
         rowid = 1:3,
         a = c("ax","cz","by"),
         b = c("cy","ax","bz"),
         c = c("bz","ay","cx")
      )

What would an efficient approach be to achieving the following transformation?

#> # A tibble: 3 x 4
#>  rowid      a       b       c
#>  <int>  <chr>   <chr>   <chr>
#>      1      x       z       y
#>      2      x       y       z
#>      3      y       z       x

The goal is to take the second character of each bigram and sort it into columns picked-out by the first character, for each row.

If possible, it would be useful to compare base R and Tidyverse solutions.



Solution 1:[1]

A Tidyverse solution partially inspired by this recent post using rotate_df() from the sjmisc package: https://stackoverflow.com/a/70682560/8068516

df <- data.frame(
     rowid = 1:3,
     a = c("ax","cz","by"),
     b = c("cy","ax","bz"),
     c = c("bz","ay","cx")
  )

library(sjmisc)
df %>%
  # transpose the dataframe keeping column names
  rotate_df(cn=TRUE) %>%
  # sort columns by first character
  mutate(across(everything(),sort)) %>%
  # transpose back
  rotate_df() %>%
  # remove first character from each string
  mutate(across(everything(),~str_sub(.,2,-1))) %>%
  # make `rowid` column
  rownames_to_column(var="rowid")

The dataframe can optionally be turned into tibble with as_tibble() to exactly match the target output, giving:

#> # A tibble: 3 x 4
#>  rowid       a        b        c
#>  <int>   <chr>    <chr>    <chr>
#>      1       x        z        y
#>      2       x        y        z
#>      3       y        z        x

This solution will generalise to n-columns and is %>% compatible.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 louisdesu