'Regex to generate dynamic sql

I want to generate dynamic sql on Notepad++ based on some rules. These rules include everything, so no sql knowledge is needed, and are the following:

  1. Dynamic sql must have each single quote escaped by another single quote ( 'hello' becomes ''hello'')
  2. Each line should begin with "+@lin"
  3. If a line has only whitespace, nothing should be following the "+@lin", despite following rules
  4. Replace each \t directly following "+@lin" with "+@tab"
  5. Add " +' " after the @lin/@tab sequence
  6. Add a single quote at the end of line

So, as an example, this input:

select 1,'hello'
from        --two tabs exist after from

    table1

should become:

+@lin+'select 1,''hello'''
+@lin+'from     --two tabs exist after from'
+@lin
+@lin+@tab+'table1'

What I have for now is the following 4 steps:

  • Replace single quote with double quotes to cover rule 1
  • Replace ^(\t*)(.*)$ with \+@lin\1\+'\2' to cover rules 2,5,6
  • Replace \t with \+@tab to cover rule 4
  • Replace (\+@tab)*\+''$ with nothing to cover rule 3

Notice that this mostly works, except for the third replacement, which replaces all tabs, and not only the ones at the beginning. I tried (?<=^\t*)\t with no success- it matches nothing.

I'm looking for a solution which satisfies the rules in as few replacement steps as possible.



Solution 1:[1]

You can use three substitutions here, it is not quite possible (without additional assumptions) to reduce the number of steps here since you need to replace at the same positions.

Step 1: Replace single quotes with double - ' with ''. No regex so far, but you can have the regex checkbox on.

Step 2: Add +@lin+ at the start of the line and only wrap its contents with ' if there is any non-whitespace char on the line (while keeping all TABs before the first '):

Find What: ^(\t*+)(\h*\S)?+(.*)
Replace With: +@lin+$1(?2'$2$3':)

Details:

  • ^ - start of a line
  • (\t*+) - Group 1 ($1): zero or more TABs
  • (\h*\S)?+ - Group 2 ($2): an optional sequence of any zero or more horizontal whitespace chars and then a non-whitespace char
  • (.*) - Group 3 ($3): the rest of the line
  • +@lin+$1(?2'$2$3':) - replaces the match with +@lin+ + Group 1 value (i.e. tabs found), and then - only if Group 2 matches - ' + Group 2 + Group 3 values + '

Step 3: Replace each TAB after +@lin+ with @tab+:

Find What: (\G(?!^)|^\+@lin\+)\t
Replace With: $1@tab+

Details:

  • (\G(?!^)|^\+@lin\+) - Group 1: either
    • \G(?!^) - end of the previous match
    • | - or
    • ^\+@lin\+ - start of a line and +@lin+ string
  • \t - a TAB char.

The replacement is the concatenation of Group 1 value and @tab+ string.

See this regex online demo.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1