'how to use php preg_match_all() with overlap matches
I am trying (with no success) to get all possible matches with preg_match_all().
Any help would be greatly apreciated. Thank you in advance. No related questions and answers clearly give a way to do that.
Here is a typical example :
the code is :
$str = "manger des pâtes à la carbonara dans un restaurant de pâtes";
$pattern = "/(.*) (son |sa |ses |un |une |des |du |le |les |la )(.*) dans (son |sa |ses |un |une |de la |des |du |la |le |les |l')(.*)/";
if(preg_match_all($pattern, $str, $matches, PREG_SET_ORDER)) {
print_r($matches);
}
the result (correct but incomplete for what I want) is :
Array (
[0] => Array (
[0] => manger des pâtes à la carbonara dans un restaurant de pâtes
[1] => manger des pâtes à
[2] => la
[3] => carbonara
[4] => un
[5] => restaurant de pâtes
)
)
what is missing is the following match :
Array (
[0] => Array (
[0] => manger des pâtes à la carbonara dans un restaurant de pâtes
[1] => manger
[2] => des
[3] => pâtes à la carbonara
[4] => un
[5] => restaurant de pâtes
)
)
overall I would like to get :
Array (
[0] => Array (
[0] => manger des pâtes à la carbonara dans un restaurant de pâtes
[1] => manger des pâtes à
[2] => la
[3] => carbonara
[4] => un
[5] => restaurant de pâtes
)
[1] => Array (
[0] => manger des pâtes à la carbonara dans un restaurant de pâtes
[1] => manger
[2] => des
[3] => pâtes à la carbonara
[4] => un
[5] => restaurant de pâtes
)
)
Solution 1:[1]
I'm not sure that building a more complicated pattern to get overlapping matches is a good idea for this case (as suggested by the duplicate link invoked to close this question).
Here, all you have to do is just a little change to your original pattern and to use it twice, once with all quantifiers set to greedy and once with all quantifiers set to non-greedy. It can be done easily with the U modifier that inverts the quantifiers behavior.
$str = "manger des pâtes à la carbonara dans un restaurant de pâtes";
$pattern = "/(.*) (son |sa |ses |un |une |des |du |le |les |la )(.*) dans (son |sa |ses |un |une |de la |des |du |la |le |les |l')(.*)\z/";
if (preg_match($pattern, $str, $matches1) && preg_match($pattern.'U', $str, $matches2)) {
$result = [$matches1, $matches2];
print_r($result);
}
I added the end of the string assertion \z to force the last quantifier to reach the end of the string when it is in non-greedy mode.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Casimir et Hippolyte |
