'Regex to match string between %
I'm trying to match substrings that are enclosed in %'s but preg_match_all seems to include several at the same time in the same line.
Code looks like this:
preg_match_all("/%.*%/", "%hey%_thereyou're_a%rockstar%\nyo%there%", $matches);
print_r($matches);
Which produces the following output.
Array
(
[0] => Array
(
[0] => %hey%_thereyou're_a%rockstar%
[1] => %there%
)
)
However I'd like it to produce the following array instead:
[0] => %hey%
[1] => %rockstar%
[2] => %there%
What am I missing?
Solution 1:[1]
You're doing a greedy match - use ? to make it ungreedy:
/%.*?%/
If a newline can occur inside the match, add the s (DOTALL) modifier:
/%.*?%/s
Solution 2:[2]
Add a ? after the *:
preg_match_all("/%.*?%/", "%hey%_thereyou're_a%rockstar%\nyo%there%", $matches);
Solution 3:[3]
The reason is that the star is greedy. That is, the star causes the regex engine to repeat the preceding token as often as possible. You should try .*? instead.
Solution 4:[4]
You could try /%[^%]+%/ - this means in between the percent signs you only want to match characters which are not percent signs.
You could also maybe make the pattern ungreedy, e.g. /%.+%/U, so it will capture as little as possible (I think).
Solution 5:[5]
|%(\w+)%| This will work exactly what do you want.
Solution 6:[6]
While the solution is to turn a greedy .* into a lazy .*? (or replace .* with [^%]*), you might also want to actually get rid of % symbols in the output.
In that case, you will need to use a capturing group and get $matches[1] if a match occurred:
$str = "%hey%_thereyou're_a%rockstar%\nyo%there%";
if (preg_match_all("/%([^%]*)%/", $str, $matches)) {
print_r($matches[1]);
}
// => Array( [0] => hey [1] => rockstar [2] => there )
Note that print_r($matches[0]); will output full matches, // => Array( [0] => %hey% [1] => %rockstar% [2] => %there% ). The [^%] pattern is a negated character class that matches any char other than a % char.
See the PHP demo.
Variations
If you need to make sure there are only letters, digits or underscores between % chars, you can use
"/%(\w*)%/"
If you want to match any chars other than % and whitespace between two % chars use
"/%([^\s%]*)%/"
The [^\s%]* pattern is a regex that matches any zero or more chars other than whitespace (\s) and a % char.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Greg |
| Solution 2 | Alix Axel |
| Solution 3 | fresskoma |
| Solution 4 | Tom Haigh |
| Solution 5 | |
| Solution 6 | Wiktor Stribiżew |
