'How to adjust variables in MS SQL Server through Python
I have several sql queries written in MS SQL Server and I used the following code to import them into Python using the pyodbc package.
import pyodbc
import pandas as pd
def conn_sql_server(file_path):
'''Function to connect to SQL Server and save query result to a dataframe
input:
file_path - query file path
output:
df - dataframe from the query result
'''
# Connect to SQL Server
conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
'Server= servername;'
'Database = databasename;'
'Trusted_Connection=yes;')
# run query and ouput the result to df
query = open(file_path, 'r')
df = pd.read_sql_query(query.read(), conn)
query.close()
return df
df1 = conn_sql_server('C:/Users/JJ/SQL script1')
df2 = conn_sql_server('C:/Users/JJ/SQL script2')
df3 = conn_sql_server('C:/Users/JJ/SQL script3')
In each SQL query, I have used DECLARE and SET to set the variables (variables are different in each SQL query). Here, I just copied a random query from online as an example. What I want to do is to update the Year variable directly in Python. My actual query is pretty long, so I don't want to copy over the SQL scripts in python, I just want to adjust the variables. Anyway to do it?
DECLARE @Year INT = 2022;
SELECT YEAR(date) @Year,
SUM(list_price * quantity) gross_sales
FROM sales.orders o
INNER JOIN sales.order_items i ON i.order_id = o.order_id
GROUP BY YEAR(date)
order by @Year
My other question is, is there anyway to add a WHERE statement, like WHERE itemNumber = 1002345 after importing the above query into Python. I'm asking this because df2 is a subset of df1. The restriction column isn't selected to show in the output, so I cannot do filterings in python after reading in df1. I could add that column in the df1 output and do more aggregations in Python, but that would largely increase the orginal data size and running time, so I prefer not to do it.
Solution 1:[1]
Here's a sample of how your script will look like. We are doing 2 modifications:
conn_sql_server now takes these parameters:
- year: you can pass the year you want to replace
declare @year... - where_clause: a
whereclause of your choice - before_clause_starts_with: the clause before which the
whereclause should be placed
- year: you can pass the year you want to replace
modify_query method that reads the contents of the file and changes the content based on the year you provided. If you provide the where clause, it'll put it before the clause you provide in before_clause_starts_with
import pyodbc
import pandas as pd
def modify_query(lines, year, where_clause, before_clause_starts_with):
new_lines = []
for line in lines:
if year is not None:
if line.lower().startswith('declare @year int ='):
new_lines.append(f"DECLARE @Year INT = {year}\n")
continue
if where_clause is not None:
if line.lower().startswith(before_clause_starts_with.lower()):
new_lines.append(where_clause + "\n")
new_lines.append(line)
continue
new_lines.append(line)
new_query = ''.join(new_lines)
return new_query
def conn_sql_server(file_path, year=None, where_clause=None, before_clause_starts_with=None):
'''Function to connect to SQL Server and save query result to a dataframe
input:
file_path - query file path
output:
df - dataframe from the query result
'''
# Connect to SQL Server
conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
'Server= servername;'
'Database = databasename;'
'Trusted_Connection=yes;')
# run query and ouput the result to df
query = open(file_path, 'r')
lines = query.readlines()
query.close()
new_query = modify_query(lines, year, where_clause, before_clause_starts_with)
df = pd.read_sql_query(new_query, conn)
return df
df1 = conn_sql_server('C:/Users/JJ/SQL script1',
year=1999,
where_clause='WHERE itemNumber = 1002345',
before_clause_starts_with='group by')
df2 = conn_sql_server('C:/Users/JJ/SQL script2')
df3 = conn_sql_server('C:/Users/JJ/SQL script3',
year = 1500)
Simulation
Let's run an example.
script1.sql
DECLARE @Year INT = 2022;
SELECT YEAR(date) @Year,
SUM(list_price * quantity) gross_sales
FROM sales.orders o
INNER JOIN sales.order_items i ON i.order_id = o.order_id
GROUP BY YEAR(date)
order by @Year
script2.sql
DECLARE @Year INT = 2022;
SELECT gross_sales
FROM sales.orders
order by @Year
script3.sql
DECLARE @Year INT = 2022;
SELECT GETDATE()
Using a script similar to the above, we'll try to see how each script looks like after it gets modified.
Simulation script
#import pyodbc
#import pandas as pd
def modify_query(lines, year, where_clause, before_clause_starts_with):
new_lines = []
print('-------')
print('ORIGINAL')
print('-------')
print(lines)
for line in lines:
if year is not None:
if line.lower().startswith('declare @year int ='):
new_lines.append(f"DECLARE @Year INT = {year}\n")
continue
if where_clause is not None:
if line.lower().startswith(before_clause_starts_with.lower()):
new_lines.append(where_clause + "\n")
new_lines.append(line)
continue
new_lines.append(line)
print('-------')
print('NEW')
print('-------')
new_query = ''.join(new_lines)
print(new_query)
return new_query
def conn_sql_server(file_path, year=None, where_clause=None, before_clause_starts_with=None):
'''Function to connect to SQL Server and save query result to a dataframe
input:
file_path - query file path
output:
df - dataframe from the query result
'''
# Connect to SQL Server
#conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
# 'Server= servername;'
# 'Database = databasename;'
# 'Trusted_Connection=yes;')
# run query and ouput the result to df
query = open(file_path, 'r')
lines = query.readlines()
query.close()
new_query = modify_query(lines, year, where_clause, before_clause_starts_with)
#df = pd.read_sql_query(new_query, conn)
#return df
#df1 = conn_sql_server('C:/Users/JJ/SQL script1')
#df2 = conn_sql_server('C:/Users/JJ/SQL script2')
#df3 = conn_sql_server('C:/Users/JJ/SQL script3')
df1 = conn_sql_server('script1.sql', year=1999, where_clause='WHERE itemNumber = 1002345', before_clause_starts_with='group by')
df2 = conn_sql_server('script2.sql')
df3 = conn_sql_server('script3.sql', year=1500)
Original query 1 was like this in script1.sql
['DECLARE @Year INT = 2022;\n', 'SELECT YEAR(date) @Year, \n', ' SUM(list_price * quantity) gross_sales\n', 'FROM sales.orders o\n', ' INNER JOIN sales.order_items i ON i.order_id = o.order_id\n', 'GROUP BY YEAR(date)\n', 'order by @Year']
After running the script, the query will become
DECLARE @Year INT = 1999
SELECT YEAR(date) @Year,
SUM(list_price * quantity) gross_sales
FROM sales.orders o
INNER JOIN sales.order_items i ON i.order_id = o.order_id
WHERE itemNumber = 1002345
GROUP BY YEAR(date)
order by @Year
Query 3 used to look like this:
['DECLARE @Year INT = 2022;\n', 'SELECT GETDATE()']
It becomes
DECLARE @Year INT = 1500
SELECT GETDATE()
Give it a shot by changing the python script as you deem fit.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | zedfoxus |
