'Grep how do I check if a token comes before another in a fil?

I am trying to find among a bunch of files who hold SQL statements whether we ever SELECT from a table before we INSERT into it. It seems it should be a one-liner with Grep.

I've come up with grep -zl "FROM (\S*).*INSERT INTO \0

The -z treats the input as one line, and then the back reference does the rest.

However testing with echo "SELECT a FROM x INSERT INTO x VALUES(1);" | grep -zl "FROM (\S*).*INSERT INTO \0" produces no result.

In fact even echo "aa aa" | grep "(\S*) \0" returns nothing.

What am I missing?



Solution 1:[1]

First, let's solve it for x:

echo "SELECT a FROM x INSERT INTO x VALUES(1);" | grep -E "FROM (\S)*x.*INSERT INTO (\S)*x"

However, you may have many tables and you are interested about all of them. So, this is how you can list the table names:

select TABLE_NAME
from information_schema.tables;

Now, let's generate the grep for each table:

select CONCAT('sudo bash foo.sh "your script" ', TABLE_NAME)
from information_schema.tables;

and implement foo.sh as follows:

echo "$1" | grep -E "FROM (\S)*$2.*INSERT INTO (\S)*$2"

The query generates the grep for each table. Naturally, you can filter your query to a selection of tables instead and you might also need to handle cases like

select ... from yourschema.yourtable

or

select ... from `yourtable`

but start with the proof-of-concept I have given and see whether that's enough for you.

Solution 2:[2]

grep solution:

Use -P option to use RegExp with Perl notation

grep -zPl "FROM ([[:alnum:]]+) INSERT INTO \1 VALUES"

Matching SELECT statement before INSERT, solution:

The reported problem is more complicated than described.

Assuming SELECT statements and corresponding INSERT statement are not in sequence.

For instance:

SELECT a FROM x1 
INSERT INTO x1 VALUES(1);
SELECT a FROM x2 
SELECT a FROM x3 
SELECT a FROM y1
SELECT a FROM x2 
INSERT INTO x3 VALUES(1);
INSERT INTO x2 VALUES(1);
INSERT INTO y1 VALUES(1);
INSERT INTO x3 VALUES(2);
SELECT a FROM y2
INSERT INTO y1 VALUES(1);

Only x3 and y1 are not matched. And there are nesting and duplicates.

We do not know ahead all table-names.

We need a stack data structure. Push-in every table-name in select select statement (no duplicates), pull-out every table-name in insert.

Implemented using gawk (standard awk in most Linux machines) associative array. Screening input SQL file once.

gawk script: script.awk

/SELECT .* FROM / { # for each line matching RegExp "SELECT .* FROM"
  # read table name from current line
  tableName = gensub(/.*FROM[[:space:]]+([[:alnum:]]+).*/,"\\1",1);
  # push-in tableName into associative array (used as stack)
  tableNamesStack[tableName] = 1; 
}
/INSERT INTO / { # for each line matching RegExp "INSERT INTO "
  # read table name from current line
  tableName = gensub(/.*INTO[[:space:]]+([[:alnum:]]+).*/,"\\1",1);
  # if current tableName is in stack
  if (tableName in tableNamesStack) {
    # pull-out current tableName from stack
    delete tableNamesStack[tableName];
  } else {
    # current tableName is missing from stack, report and continue.
    printf ("Unmatched INSERT statement in line %d, for table %s\n", NR, tableName);
  }
}

running script.awk

gawk -f script.awk input.sql 

Unmatched INSERT statement in line 9, for table x3
Unmatched INSERT statement in line 11, for table y1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Lajos Arpad
Solution 2 Dudi Boy