'Finding "hidden patterns" in string in MSSQL data using regex (Positive Lookahead)
I'm working on finding obfuscated data in string fields. I have a regex that works in a Python script, but I realized that what I was doing would likely be done more efficiently directly in the database. BUT MSSQL doesn't support lookaround (Positive Lookahead.)
Eventually I want to dynamically feed an sql statement a list of targets to search for in candidates, but for now I am just trying to figure out how to get the match working.
Target:
- "4X5G"
Candidates:
- "Ipsum 47 loreix 5-g blue scuba rock." 4X5G are 13 characters apart, I would want to return this a potential candidate.
- "Ipsum 47 loreix blue scuba 5-g rock." 4X5G are 24 characters apart, I would NOT want to return this a potential candidate.
Traditional REGEX:
(?=.{9,14}g)4.*?x.*?5.*?g
What DOESN'T work:
WHERE [field] LIKE '%(?=.{9,14}g)4.*?x.*?5.*?g%'
Solution 1:[1]
By using the information here: LINK I was able to add a CLR and create a scalar function in my database that works great. I can feed it complicated regex statements and get back results.
Thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Steve |
