'How to get first n characters from a string in R

I would like to extract three letters of each string for each row in df as below

Exampe:

df <- data.frame(name = c('Jame Bond', "Maria Taylor", "Micheal Balack"))
df
            name
1      Jame Bond
2   Maria Taylor
3 Micheal Balack

desired out

df_new 
        name
1      Jam_Bon
2      Mar_Tay
3      Mic_Bal

Any sugesstions for this using tidyverse?



Solution 1:[1]

library(stringr)
library(dplyr)

df$name %>% 
  str_extract_all("(?<=(^|[:space:]))[:alpha:]{3}") %>% 
  map_chr(~ str_c(.x, collapse = "_"))

The stringr cheatsheet is very useful for working through these types of problems. https://www.rstudio.com/resources/cheatsheets/

Created on 2022-03-26 by the reprex package (v2.0.1)

Solution 2:[2]

You can try this with dplyr::rowwise(), stringr::str_split() and stringr::str_sub():

df_new <- df %>% 
  rowwise() %>% 
  mutate(name = paste(
    unlist(
      lapply(str_split(name, ' '), function(x){
        str_sub(x, 1, 3)
      })
    ), 
    collapse = "_"
  ))

I got the same result as you expected :

> df_new
# A tibble: 3 x 1
# Rowwise: 
  name   
  <chr>  
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal

Solution 3:[3]

An alternative method using tidyr functions:

df |> 
  extract(name, c("x1","x2"), "(\\w{3}).*\\s(\\w{3})") |> 
  unite(col = "name",x1,x2, sep = "_")

Giving:

     name
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal

Note that this assumes all first names and surnames have at least 3 characters, otherwise replace the extract regex with "(\\w{1,3}).*\\s(\\w{1,3})"

Solution 4:[4]

In base R, we can use sub - capture ((...)) the first three non-space (\\S) characters from the start (^), followed by zero or more non-white space and a white space (\\S*\\s), then capture the second set of 3 non-white characters. In the replacement, specify the backreferences (\\1, \\2) of the captured groups and insert underscore (_) between those

df$name <- sub("^(\\S{3})\\S*\\s(\\S{3}).*", "\\1_\\2", df$name)
df$name
[1] "Jam_Bon" "Mar_Tay" "Mic_Bal"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 acammack1234
Solution 2
Solution 3
Solution 4