'split cell at special character if comma found after first word

hi i've got some budget data with names and titles that read "Last, First - Title" and other rows in same column position that read "anything really - ,asd;flkajsd". I'd like to split the column IF first word ends in a "," at the "-" position that follows it.

ive tried this:

C22$ITEM2 <- ifelse(grepl(",", C22$ITEM), C22$ITEM,  NA)
test <- str_split_fixed(C22$ITEM2, "-", 2)
C22 <- cbind(C22, test)

but i'm getting other cells with commas elsewhere, need to limit to just "if first word ends in comma"

r


Solution 1:[1]

library(tidyverse)

data <- tibble(data = c("Doe, John - Mr", "Anna, Anna - Ms", " ,asd;flkajsd"))
data
data %>%
  # first word must ed with a
  filter(data %>% str_detect("^[A-z]+a")) %>%
  separate(data, into = c("Last", "First", "Title"), sep = "[,-]") %>%
  mutate_all(str_trim)
# A tibble: 1 × 3
#  Last  First Title
#  <chr> <chr> <chr>
#1 Anna  Anna  Ms   

Solution 2:[2]

We may use extract to do this - capture the regex pattern as two groups ((...)) where the first group would return word (\\w+) from the start (^) of the string followed by a ,, zero or more space (\\s*), another word (\\w+), then the - (preceding or succeeding zero or more space and the second capture group with the word (\\w+) before the end ($) of the string

library(tidyr)
library(dplyr)
extract(C22, ITEM, into = c("Name", "Title"), 
      "^(\\w+,\\s*\\w+)\\s*-\\s*(\\w+)$") %>%
   mutate(Name = coalesce(Name, ITEM), .keep = 'unused')

NOTE: The mutate is added in case the regex didn't match and return NA elements, we coalesce with the original column to return the value that corresponds to NA

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 danlooo
Solution 2