'How can I use Regex to extract all words that written in the camel case
I tried to extract all consecutive capitalized words in a given string written with no spacing in between.
E.g. The University Of Sydney => TheUniversityOfSydney, Regular Expression => RegularExpression, and This Is A Simple Variable => ThisIsASimpleVariable.
I start with this code, but it comes as a list:
import re
string = "I write a syntax of Regular Expression"
result = re.findall(r"\b[A-Z][a-z]*\b", string)
print(result)
I expect to get RegularExpression here.
Solution 1:[1]
You need to use
import re
text = "I write a syntax of Regular Expression"
rx = r"\b[A-Z]\w*(?:\s+[A-Z]\w*)+"
result = ["".join(x.split()) for x in re.findall(rx, text)]
print(result) # => ['RegularExpression']
See the Python demo.
The regex is explained in How can I use Regex to abbreviate words that all start with a capital letter.
In this case, the regex is used in re.findall to extract matches, and "".join(x.split()) is a post-process step to remove all whitespaces from the found texts.
If you only expect one single match in each string, use re.search:
import re
text = "I write a syntax of Regular Expression"
rx = r"\b[A-Z]\w*(?:\s+[A-Z]\w*)+"
result = re.search(rx, text)
if result:
print( "".join(result.group().split()) ) # => 'RegularExpression'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
