'Improving the performance of an PCRE Regex Pattern
I have the below regex here which is written to support the PRCE/PRCE2 format. However, this throws the following error “Evaluation takes too long. Please check your regular expression.” Is there any way we can improve the performance of this regex by simplifying it?
Also, the regex throws "catastrophic backtracking" error as well.
(\border\D*\W*)\d+(*SKIP)(*F)|(\border\D*number\W*)\d+(*SKIP)(*F)|(?<!x)(?=(?:[._ –-]*\d){9})(?!9|66\D*6|00\D*0|(?:\d\D*){3}0\D*0|(?:\d\D*){5}0(?:\D*0){3})\d(?:[._ –-]*\d){4}
The above regex has set of rules in it. Please find the requirements of the regex.
- First 5 numbers should only be masked in a 9 digit number.
- Should not mask any numbers if the 'x' or 'X' precedes the 9 digit number.
- If the "order" or "order number" string precedes the 9 digit numbers, then it should not be matched.
- You can find the list of use cases for the same along with the rules in this link. Usecases with requirements
Regex101 is giving me the exact output as expected but it has a performance issue. Need to simplify it.
Solution 1:[1]
You may try this refactored regex:
\b(?>x|order(?>[\W_]*number)?[\W_]*)\d+(*SKIP)(*F)|(?=(?>[._ –-]*\d){9})(?>(?>9|6{3}|0{3}|(?>\d\D*){3}00|(?>\d\D*){5}0{4})(*SKIP)(*F)|\d(?>\D*\d){4})
Compared to your existing demo link it is taking almost half number of steps in the demo link.
Solution 2:[2]
Other approach: expand the pattern!
~(*UTF)
x (?<! \w x ) [\d._ –-]* (*SKIP) (*F)
|
order (?<! \w order ) [\W_]* (?:number)? [\W_]* [\d._ –-]* (*SKIP) (*F)
|
(?<res>
[1-578] (?: [._ –-]{0,3}+ \d ){2} (?: 0{2} [\d._ –-]* (*SKIP) (*F) )?
(?: [._ –-]{0,3}+ \d ){2}
)
(?: 0{4} [\d._ –-]* (*SKIP) (*F) )? (?: [._ –-]{0,3}+ \d ){4}
|
(?<res>
0 (?: 0{2} [\d._ –-]* (*SKIP) (*F) )? (?: [._ –-]{0,3}+ \d ){2}
(?: 0{2} [\d._ –-]* (*SKIP) (*F) )? (?: [._ –-]{0,3}+ \d ){2}
)
(?: 0{4} [\d._ –-]* (*SKIP) (*F) )? (?: [._ –-]{0,3}+ \d ){4}
|
(?<res>
6 (?: 6{2} [\d._ –-]* (*SKIP) (*F) )? (?: [._ –-]{0,3}+ \d ){2}
(?: 0{2} [\d._ –-]* (*SKIP) (*F) )? (?: [._ –-]{0,3}+ \d ){2}
)
(?: 0{4} [\d._ –-]* (*SKIP) (*F) )? (?: [._ –-]{0,3}+ \d ){4}
|
9 [\d._ –-]* (*SKIP) (*F)
~iJx
The pattern is indeed more long, but 2 times faster and with 8 times fewer steps.
Note that I uses a capture group to extract the first 5 digits and the 4 remaining digits are also consumed, but if you prefer, you can also remove this capture group and put the 4 remaining digits in a lookahead (more steps but more efficient).
I started the pattern with (*UTF)
since it contains a dash out of the ascii range.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | anubhava |
Solution 2 |