'How to get first n characters from a string in R
I would like to extract three letters of each string for each row in df as below
Exampe:
df <- data.frame(name = c('Jame Bond', "Maria Taylor", "Micheal Balack"))
df
name
1 Jame Bond
2 Maria Taylor
3 Micheal Balack
desired out
df_new
name
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal
Any sugesstions for this using tidyverse?
Solution 1:[1]
library(stringr)
library(dplyr)
df$name %>%
str_extract_all("(?<=(^|[:space:]))[:alpha:]{3}") %>%
map_chr(~ str_c(.x, collapse = "_"))
The stringr cheatsheet is very useful for working through these types of problems.
https://www.rstudio.com/resources/cheatsheets/
Created on 2022-03-26 by the reprex package (v2.0.1)
Solution 2:[2]
You can try this with dplyr::rowwise(), stringr::str_split() and stringr::str_sub():
df_new <- df %>%
rowwise() %>%
mutate(name = paste(
unlist(
lapply(str_split(name, ' '), function(x){
str_sub(x, 1, 3)
})
),
collapse = "_"
))
I got the same result as you expected :
> df_new
# A tibble: 3 x 1
# Rowwise:
name
<chr>
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal
Solution 3:[3]
An alternative method using tidyr functions:
df |>
extract(name, c("x1","x2"), "(\\w{3}).*\\s(\\w{3})") |>
unite(col = "name",x1,x2, sep = "_")
Giving:
name
1 Jam_Bon
2 Mar_Tay
3 Mic_Bal
Note that this assumes all first names and surnames have at least 3 characters, otherwise replace the extract regex with "(\\w{1,3}).*\\s(\\w{1,3})"
Solution 4:[4]
In base R, we can use sub - capture ((...)) the first three non-space (\\S) characters from the start (^), followed by zero or more non-white space and a white space (\\S*\\s), then capture the second set of 3 non-white characters. In the replacement, specify the backreferences (\\1, \\2) of the captured groups and insert underscore (_) between those
df$name <- sub("^(\\S{3})\\S*\\s(\\S{3}).*", "\\1_\\2", df$name)
df$name
[1] "Jam_Bon" "Mar_Tay" "Mic_Bal"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | acammack1234 |
| Solution 2 | |
| Solution 3 | |
| Solution 4 |
