'How do i write a RegEx that starts reading from behind?
I have a series of words I try to capture.
I have the following problem:
- The string ends with a fixed set of words
- It is not clearly defined how many words the string consists of. However, it should capture all words that start with a upper case letter (German language). Therefore, the left anchor should be the first word starting with lower case.
Example (bold is what I try to capture):
I like Apple Bananas And Cars.
building houses Might Be Salty + Hard said Jessica.
This is the RegEx I tried so far, it only works, if the "non-capture" string does not include any upper case words:
/(?:[a-zäöü]*)([\p{L} +().&]+[Cars|Hard])/gu
Solution 1:[1]
Use \p{Lu} for uppercase letters:
(?:[\p{Lu}+()&][\p{L}+()&]* )+(?:Cars|Hard)
See live demo (showing matching umlauted letters and ß).
Solution 2:[2]
You might start the match with an uppercase character allowing German uppercase chars as well, and then optionally repeat matching either words that start with an uppercase character, or a "special character.
Then end the match with an alternation matching either Hard or Cars.
(?<!\S)[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜ?]*(?:\s+(?:[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜ?]*|[+()&]))*\s+(?:Hard|Cars)\b
Explanation
(?<!\S)Assert a whitespace boundary to the left to prevent starting the match after a non whitespace char[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜ?]*Match a word that starts with an uppercase char(?:Non capture group to match as a whole part\s+Match 1+ whitespace chars(?:Non capture group[A-ZÄÖÜß][a-zA-ZäöüßÄÖÜ?]*Match a word that starts with uppercase|Or[+()&]Match one of the "special" chars
)Close the non capture group
)*Close the non capture group and optionally repeat it\s+Match 1+ whitespace chars(?:Hard|Cars)Match one of the alternatives\bA word boundary to prevent a partial word match
See a regex demo.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | The fourth bird |
