'split cell at special character if comma found after first word
hi i've got some budget data with names and titles that read "Last, First - Title" and other rows in same column position that read "anything really - ,asd;flkajsd". I'd like to split the column IF first word ends in a "," at the "-" position that follows it.
ive tried this:
C22$ITEM2 <- ifelse(grepl(",", C22$ITEM), C22$ITEM, NA)
test <- str_split_fixed(C22$ITEM2, "-", 2)
C22 <- cbind(C22, test)
but i'm getting other cells with commas elsewhere, need to limit to just "if first word ends in comma"
Solution 1:[1]
library(tidyverse)
data <- tibble(data = c("Doe, John - Mr", "Anna, Anna - Ms", " ,asd;flkajsd"))
data
data %>%
# first word must ed with a
filter(data %>% str_detect("^[A-z]+a")) %>%
separate(data, into = c("Last", "First", "Title"), sep = "[,-]") %>%
mutate_all(str_trim)
# A tibble: 1 × 3
# Last First Title
# <chr> <chr> <chr>
#1 Anna Anna Ms
Solution 2:[2]
We may use extract to do this - capture the regex pattern as two groups ((...)) where the first group would return word (\\w+) from the start (^) of the string followed by a ,, zero or more space (\\s*), another word (\\w+), then the - (preceding or succeeding zero or more space and the second capture group with the word (\\w+) before the end ($) of the string
library(tidyr)
library(dplyr)
extract(C22, ITEM, into = c("Name", "Title"),
"^(\\w+,\\s*\\w+)\\s*-\\s*(\\w+)$") %>%
mutate(Name = coalesce(Name, ITEM), .keep = 'unused')
NOTE: The mutate is added in case the regex didn't match and return NA elements, we coalesce with the original column to return the value that corresponds to NA
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | danlooo |
| Solution 2 |
