'Parsing SQL with Python
I want to create a SQL interface on top of a non-relational data store. Non-relational data store, but it makes sense to access the data in a relational manner.
I am looking into using ANTLR to produce an AST that represents the SQL as a relational algebra expression. Then return data by evaluating/walking the tree.
I have never implemented a parser before, and I would therefore like some advice on how to best implement a SQL parser and evaluator.
- Does the approach described above sound about right?
- Are there other tools/libraries I should look into? Like PLY or Pyparsing.
- Pointers to articles, books or source code that will help me is appreciated.
Update:
I implemented a simple SQL parser using pyparsing. Combined with Python code that implement the relational operations against my data store, this was fairly simple.
As I said in one of the comments, the point of the exercise was to make the data available to reporting engines. To do this, I probably will need to implement an ODBC driver. This is probably a lot of work.
Solution 1:[1]
This reddit post suggests python-sqlparse as an existing implementation, among a couple other links.
Solution 2:[2]
TwoLaid's Python SQL Parser works very well for my purposes. It's written in C and needs to be compiled. It is robust. It parses out individual elements of each clause.
https://github.com/TwoLaid/python-sqlparser
I'm using it to parse out queries column names to use in report headers. Here is an example.
import sqlparser
def get_query_columns(sql):
'''Return a list of column headers from given sqls select clause'''
columns = []
parser = sqlparser.Parser()
# Parser does not like new lines
sql2 = sql.replace('\n', ' ')
# Check for syntax errors
if parser.check_syntax(sql2) != 0:
raise Exception('get_query_columns: SQL invalid.')
stmt = parser.get_statement(0)
root = stmt.get_root()
qcolumns = root.__dict__['resultColumnList']
for qcolumn in qcolumns.list:
if qcolumn.aliasClause:
alias = qcolumn.aliasClause.get_text()
columns.append(alias)
else:
name = qcolumn.get_text()
name = name.split('.')[-1] # remove table alias
columns.append(name)
return columns
sql = '''
SELECT
a.a,
replace(coalesce(a.b, 'x'), 'x', 'y') as jim,
a.bla as sally -- some comment
FROM
table_a as a
WHERE
c > 20
'''
print get_query_columns(sql)
# output: ['a', 'jim', 'sally']
Solution 3:[3]
Of course, it may be best to leverage python-sqlparse on Google Code
UPDATE: Now I see that this has been suggested - I concur that this is worthwhile:
Solution 4:[4]
I am using python-sqlparse with great success.
In my case I am working with queries that are already validated, my AST-walking code can make some sane assumptions about the structure.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Dylan Hogg |
| Solution 2 | |
| Solution 3 | Mathieu Longtin |
| Solution 4 | Overbryd |
