'find substring with special characters
I have pattern 'šalotka 29%'
and i need to know if string 'something something šalotka 29% something' contains the pattern
but not if the pattern is part of a longer word 'something something šalotka 29%something'
I have this mb_eregi('\b' . $pattern . '\b', $string)
but its not working because regex boundaries not working with special character.
Any suggestion?
Solution 1:[1]
A word boundary matches only between a word character (a character from the \w character class) and a non-word character or the limit of the string.
If your searched string starts or ends with a non-word character, you can't use a word-boundary.
The difficulty is to define yourself precisely what separates the desired chain from the rest. In other words, it is your choice.
Whatever your choice is, you can use the same technique: using lookarounds before and after your string to define what you don't want around your string: a negative lookbehind (?<!...) and a negative lookahead (?!...).
Example:
- to forbid all that isn't a whitespace around the string:
mb_eregi('(?<!\S)' . $item . '(?!\S)', $string, $match);
- to forbid all that isn't a word character:
mb_eregi('(?<!\w)' . $item . '(?!\w)', $string, $match);
full example:
$item = 'šalotka 29%';
$string = 'something something šalotk 29% something';
mb_regex_encoding('UTF-8'); // be sure to use the correct encoding
// if needed escape regex special characters
$item = mb_eregi_replace('[\[\](){}.\\\\|$^?+*#-]', '\\\0', $item);
mb_eregi('(?<!\S)' . $item . '(?!\S)', $string, $matches);
print_r($matches);
Notices:
If
eregfunctions are now obsolete and have been removed from recent PHP versions,mb_eregfunctions, based on the oniguruma regex engine, still exist and offer features not available inpreg_functions (PCRE).Obviously for this current question, you can do the same with
preg_match:
preg_match('~(?<!\S)' . $item . '(?!\S)~ui', $string, $match);
- If don't have the control of the searched string (a user input for example), take care that this one doesn't contain special regex characters.
Withpreg_functions you can usepreg_quoteto escape them, but it's also possible to "do it yourself" with$item = mb_ereg_replace('[\[\](){}.\\\\|$^?+*#-]', '\\\0', $item);that suffices for most of the syntaxes available inmb_eregfunctions (Note that escaping all non-word characters does the job too). Feel free to write your own if you want to deal with Emacs or BRE syntaxes.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
