'Grep how do I check if a token comes before another in a fil?
I am trying to find among a bunch of files who hold SQL statements whether we ever SELECT from a table before we INSERT into it. It seems it should be a one-liner with Grep.
I've come up with
grep -zl "FROM (\S*).*INSERT INTO \0
The -z treats the input as one line, and then the back reference does the rest.
However testing with
echo "SELECT a FROM x INSERT INTO x VALUES(1);" | grep -zl "FROM (\S*).*INSERT INTO \0"
produces no result.
In fact even echo "aa aa" | grep "(\S*) \0" returns nothing.
What am I missing?
Solution 1:[1]
First, let's solve it for x:
echo "SELECT a FROM x INSERT INTO x VALUES(1);" | grep -E "FROM (\S)*x.*INSERT INTO (\S)*x"
However, you may have many tables and you are interested about all of them. So, this is how you can list the table names:
select TABLE_NAME
from information_schema.tables;
Now, let's generate the grep for each table:
select CONCAT('sudo bash foo.sh "your script" ', TABLE_NAME)
from information_schema.tables;
and implement foo.sh as follows:
echo "$1" | grep -E "FROM (\S)*$2.*INSERT INTO (\S)*$2"
The query generates the grep for each table. Naturally, you can filter your query to a selection of tables instead and you might also need to handle cases like
select ... from yourschema.yourtable
or
select ... from `yourtable`
but start with the proof-of-concept I have given and see whether that's enough for you.
Solution 2:[2]
grep solution:
Use -P option to use RegExp with Perl notation
grep -zPl "FROM ([[:alnum:]]+) INSERT INTO \1 VALUES"
Matching SELECT statement before INSERT, solution:
The reported problem is more complicated than described.
Assuming SELECT statements and corresponding INSERT statement are not in sequence.
For instance:
SELECT a FROM x1
INSERT INTO x1 VALUES(1);
SELECT a FROM x2
SELECT a FROM x3
SELECT a FROM y1
SELECT a FROM x2
INSERT INTO x3 VALUES(1);
INSERT INTO x2 VALUES(1);
INSERT INTO y1 VALUES(1);
INSERT INTO x3 VALUES(2);
SELECT a FROM y2
INSERT INTO y1 VALUES(1);
Only x3 and y1 are not matched. And there are nesting and duplicates.
We do not know ahead all table-names.
We need a stack data structure.
Push-in every table-name in select select statement (no duplicates), pull-out every table-name in insert.
Implemented using gawk (standard awk in most Linux machines) associative array. Screening input SQL file once.
gawk script: script.awk
/SELECT .* FROM / { # for each line matching RegExp "SELECT .* FROM"
# read table name from current line
tableName = gensub(/.*FROM[[:space:]]+([[:alnum:]]+).*/,"\\1",1);
# push-in tableName into associative array (used as stack)
tableNamesStack[tableName] = 1;
}
/INSERT INTO / { # for each line matching RegExp "INSERT INTO "
# read table name from current line
tableName = gensub(/.*INTO[[:space:]]+([[:alnum:]]+).*/,"\\1",1);
# if current tableName is in stack
if (tableName in tableNamesStack) {
# pull-out current tableName from stack
delete tableNamesStack[tableName];
} else {
# current tableName is missing from stack, report and continue.
printf ("Unmatched INSERT statement in line %d, for table %s\n", NR, tableName);
}
}
running script.awk
gawk -f script.awk input.sql
Unmatched INSERT statement in line 9, for table x3
Unmatched INSERT statement in line 11, for table y1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Lajos Arpad |
| Solution 2 | Dudi Boy |
