'Split string based on capitalized word using r
I have a string that I would like to split into several strings.
library(stringr)
testString <- "SMITH, Klaus, text, text, SMITH, Samantha, text, text, MUELLER, Klaus, text, text, MUELLER, Klara, text, text"
Whenever a new word is completely capitalised (followed by a comma) it should start a new string. At the end it should look like this:
[1] "VOLZ, Klaus, text, text,"
[2] "MUELLER, Klaus, text, text,"
[3] "MUELLER, Klara, text, text,"
I have tried different code here with strsplit, but I can't get r to say that it should not only search for a letter but a complete word (which can have a different number of letters) and then split the string.
strsplit(testString, "(?!^)(?<=[[:upper:]]{2})", perl=T)
Solution 1:[1]
Use a regex lookaround - match one or more space (\\s+) that precedes one or more uppercase letter followed by a , ((?=[A-Z]+,))
strsplit(testString, "\\s+(?=[A-Z]+,)", perl = TRUE)[[1]]
-output
[1] "SMITH, Klaus, text, text,"
[2] "SMITH, Samantha, text, text,"
[3] "MUELLER, Klaus, text, text,"
[4] "MUELLER, Klara, text, text"
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | akrun |
