'Extract text up to the n-th character in a string, but return the whole string if the character isn't present
I was looking at this question and the accepted answer gives this as a solution for the case when there are fewer than n characters in the string:
^(([^>]*>){4}|.*)
However, I have done a fiddle here, and it shows that this regex will just simply return the entire string all of the time.
This code:
SELECT
SUBSTRING(a FROM '^(([^>]*>){4}|.*)'),
a,
LENGTH(SUBSTRING(a FROM '^(([^>]*>){4}|.*)')),
LENGTH(a),
LENGTH(SUBSTRING(a FROM '^(([^>]*>){4}|.*)')) = LENGTH(a)
FROM s
WHERE LENGTH(SUBSTRING(a FROM '^(([^>]*>){4}|.*)')) = LENGTH(a) IS false;
after several runs returns no records - meaning that the regex is doing nothing.
Question:
I would like a regex which returns up to the fourth > character (not including it) OR the entire string if the string only contains 3 or fewer > characters. RTRIM() can always be used to trim the final > if not including it is too tricky - having an answer which gives both possibilities would help me to deepen my understanding of regexes!
This is not a duplicate - it's certainly related, but I'd like to correct the error in the original answer - and provide a correct answer of my own.
Solution 1:[1]
You can repeat matching 0-3 times including the > using
^(?:[^>]*>){0,3}[^>]*
^Start of string(?:[^>]*>){0,3}Repeat 0 - 3 times matching any character except>and then match>[^>]*Optionally match any char except>
See a regex demo.
If there should be at least a single > then the quantifier can be {1,3}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
