'Finding "hidden patterns" in string in MSSQL data using regex (Positive Lookahead)

I'm working on finding obfuscated data in string fields. I have a regex that works in a Python script, but I realized that what I was doing would likely be done more efficiently directly in the database. BUT MSSQL doesn't support lookaround (Positive Lookahead.)

Eventually I want to dynamically feed an sql statement a list of targets to search for in candidates, but for now I am just trying to figure out how to get the match working.

Target:

  1. "4X5G"

Candidates:

  1. "Ipsum 47 loreix 5-g blue scuba rock." 4X5G are 13 characters apart, I would want to return this a potential candidate.
  2. "Ipsum 47 loreix blue scuba 5-g rock." 4X5G are 24 characters apart, I would NOT want to return this a potential candidate.

Traditional REGEX:

(?=.{9,14}g)4.*?x.*?5.*?g

What DOESN'T work:

WHERE [field] LIKE '%(?=.{9,14}g)4.*?x.*?5.*?g%'


Solution 1:[1]

By using the information here: LINK I was able to add a CLR and create a scalar function in my database that works great. I can feed it complicated regex statements and get back results.

Thanks!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Steve