'Extracting a letter and put it in a separated column in R

I have data set like this:

df<-data.frame(ID=(1:5), column1=c("AA","GG","AG","AA","AT"), column2=c("AA","GG","AG","AA","AT"), stringsAsFactors=FALSE)
df

ID column1 column2
 1      AA      AA
 2      GG      GG
 3      AG      AG
 4      AA      AA
 5      AT      AT

I want to separate each column into 2 letters so the output will look something like this:

ID column1.A column1.B column2.A column2.B
 1         A         A         A         A
 2         G         G         G         G
 3         A         G         A         G
 4         A         A         A         A
 5         A         T         A         T

Can you help me please?

r


Solution 1:[1]

Uisng strsplit.

cbind(df[1], do.call(cbind.data.frame, lapply(df[-1], function(x) 
  do.call(rbind, strsplit(x, '')))))
#   ID column1.1 column1.2 column2.1 column2.2
# 1  1         A         A         A         A
# 2  2         G         G         G         G
# 3  3         A         G         A         G
# 4  4         A         A         A         A
# 5  5         A         T         A         T

Solution 2:[2]

Yet another solution, tidyverse-based:

library(tidyverse)

df<-data.frame(ID=(1:5), column1=c("AA","GG","AG","AA","AT"), column2=c("AA","GG","AG","AA","AT"), stringsAsFactors=FALSE)

df %>% 
  mutate(
    across(
      starts_with("column"), ~
      str_split(get(cur_column()), "(?<=[A-Z])(?=[A-Z])", simplify = T),
      .names="{.col}_sep"), column1 = NULL, column2 = NULL)

#>   ID column1_sep.1 column1_sep.2 column2_sep.1 column2_sep.2
#> 1  1             A             A             A             A
#> 2  2             G             G             G             G
#> 3  3             A             G             A             G
#> 4  4             A             A             A             A
#> 5  5             A             T             A             T

Another possibility, based on a pivot_longer followed by a pivot_wider:

library(tidyverse)

df<-data.frame(ID=(1:5), column1=c("AA","GG","AG","AA","AT"), column2=c("AA","GG","AG","AA","AT"), stringsAsFactors=FALSE)


df %>% 
  pivot_longer(-ID) %>% 
  separate(value, into=LETTERS[1:2], sep= "(?<=[A-Z])(?=[A-Z])") %>% 
  pivot_wider(ID, names_from = "name", values_from = c(A,B), 
              names_glue = "{name}.{.value}") %>% 
  relocate(column1.B,.before=column2.A)

#> # A tibble: 5 × 5
#>      ID column1.A column1.B column2.A column2.B
#>   <int> <chr>     <chr>     <chr>     <chr>    
#> 1     1 A         A         A         A        
#> 2     2 G         G         G         G        
#> 3     3 A         G         A         G        
#> 4     4 A         A         A         A        
#> 5     5 A         T         A         T

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 marc_s
Solution 2