'Regex pattern to find all the digits which don't have the immediate dot character
Can any of you please help me to write a regex pattern for the below requirement?
- Section tags that don't have numbers
- All section tag numbers that don't have a dot character followed by.
- Numbers that are closer to the section tag only that to be considered.
Test String:
<sectionb>2.3. Optimized test sentence<op>(</op>1,1<cp>)</cp></sectionb>
*<sectiona>2 Surface Model: ONGV<op>(</op>1,1<cp>)</cp></sectiona>*
<sectiona>3. Verification of MKJU<op>(</op>1,1<cp>)</cp> Entity</sectiona>
*<sectionc>3. 2. 1 <txt>Case 1</txt> Annual charges to SGX</sectionc>*
*<sectiona>Compound Interest<role>back</role></sectiona>*
Pattern:
<section[a-z]>[\d]*[^\.]*<\/section[a-z]
Regex Pattern Should Match the below string:
<sectiona>2 Surface Model: ONGV<op>(</op>1,1<cp>)</cp></sectiona>
<sectionc>3. 2 1 <txt>Case 1</txt> Annual charges to SGX</sectionc>
<sectiona>Compound Interest<role>back</role></sectiona>
Solution 1:[1]
This matches the updated requirements:
<section\w+>(((\d+\.\s*)*(\d+[^\.]))|[^\d]).*?<\/section\w>
<section\w+> \w is mostly the same as [a-z] with + to allow for 0 or more (<section> <sectionabc>), remove + for exactly one letter
(\d+\.\s*)* 0 or more digit/dot/any number of spaces - match updated row 3 where it's now 3. 2. 1 with spaces after dots
(\d+[^\.]) must match digit without a dot, one or more digits
((...)|[^\d]) or section does not start with a digit (match row 5)
.*? followed by any character, as few as times as possible upto the following </section - could likely do this with a look ahead to simplify the regex, but, for me, this keeps the separate "no digits" clause separate.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | freedomn-m |
