'Splitting a comma- and semicolon-delimited string in R

I'm trying to split a string containing two entries and each entry has a specific format:

Category (e.g. active site/region) which is followed by a :
Term (e.g. His, Glu/nucleotide-binding motif A) which is followed by a ,

Here's the string that I want to split:

string <- "active site: His, Glu,region: nucleotide-binding motif A,"

This is what I have tried so far. Except for the two empty substrings, it produces the desired output.

unlist(str_extract_all(string, ".*?(?=,(?:\\w+|$))"))

[1] "active site: His, Glu"              ""                                   "region: nucleotide-binding motif A"
[4] ""

How do I get rid of the empty substrings?

Solution 1:^[1]

You get the empty strings because .*? can also match an empty string where this assertion (?=,(?:\\w+|$)) is true

You can exclude matching a colon or comma using a negated character class before matching :

[^:,\n]+:.*?(?=,(?:\w|$))

Explanation

[^:,\n]+ Match 1+ chars other than : , or a newline
: Match the colon
.*? Match any char as least as possbiel
(?= Positive lookahead, assert that what is directly to the right from the current position:
- , Match literally
- (?:\w|$) Match either a single word char, or assert the end of the string
) Close the lookahead

Regex demo | R demo

string <- "active site: His, Glu,region: nucleotide-binding motif A,"
unlist(str_extract_all(string, "[^:,\\n]+:.*?(?=,(?:\\w|$))"))

Output

[1] "active site: His, Glu"              "region: nucleotide-binding motif A"

Solution 2:^[2]

Much longer and not as elegant as @The fourth bird +1, but it works:

library(stringr)

string2 <- strsplit(string, "([^,]+,[^,]+),", perl = TRUE)[[1]][2]
string1 <- str_replace(string, string2, "")
string <- str_replace_all(c(string1, string2), '\\,$', '')

> string
[1] "active site: His, Glu"             
[2] "region: nucleotide-binding motif A"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	TarJae

'Splitting a comma- and semicolon-delimited string in R

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]