'How to adjust variables in MS SQL Server through Python

I have several sql queries written in MS SQL Server and I used the following code to import them into Python using the pyodbc package.

import pyodbc
import pandas as pd 

def conn_sql_server(file_path):
    '''Function to connect to SQL Server and save query result to a dataframe
        input:
            file_path - query file path
        output:
        df - dataframe from the query result
    '''

    # Connect to SQL Server
    conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
                      'Server= servername;'
                      'Database = databasename;'
                      'Trusted_Connection=yes;')

    # run query and ouput the result to df
    query = open(file_path, 'r') 
    df = pd.read_sql_query(query.read(), conn)
    query.close() 

    return df   

df1 = conn_sql_server('C:/Users/JJ/SQL script1')
df2 = conn_sql_server('C:/Users/JJ/SQL script2')
df3 = conn_sql_server('C:/Users/JJ/SQL script3')

In each SQL query, I have used DECLARE and SET to set the variables (variables are different in each SQL query). Here, I just copied a random query from online as an example. What I want to do is to update the Year variable directly in Python. My actual query is pretty long, so I don't want to copy over the SQL scripts in python, I just want to adjust the variables. Anyway to do it?

DECLARE @Year INT = 2022;
SELECT YEAR(date) @Year, 
       SUM(list_price * quantity) gross_sales
FROM sales.orders o
     INNER JOIN sales.order_items i ON i.order_id = o.order_id
GROUP BY YEAR(date)
order by @Year

My other question is, is there anyway to add a WHERE statement, like WHERE itemNumber = 1002345 after importing the above query into Python. I'm asking this because df2 is a subset of df1. The restriction column isn't selected to show in the output, so I cannot do filterings in python after reading in df1. I could add that column in the df1 output and do more aggregations in Python, but that would largely increase the orginal data size and running time, so I prefer not to do it.



Solution 1:[1]

Here's a sample of how your script will look like. We are doing 2 modifications:

  • conn_sql_server now takes these parameters:

    • year: you can pass the year you want to replace declare @year...
    • where_clause: a where clause of your choice
    • before_clause_starts_with: the clause before which the where clause should be placed
  • modify_query method that reads the contents of the file and changes the content based on the year you provided. If you provide the where clause, it'll put it before the clause you provide in before_clause_starts_with

import pyodbc
import pandas as pd 

def modify_query(lines, year, where_clause, before_clause_starts_with):
    new_lines = []

    for line in lines:

        if year is not None:
            if line.lower().startswith('declare @year int ='):
                new_lines.append(f"DECLARE @Year INT = {year}\n")
                continue

        if where_clause is not None:
            if line.lower().startswith(before_clause_starts_with.lower()):
                new_lines.append(where_clause + "\n")
                new_lines.append(line)
                continue

        new_lines.append(line)

    new_query = ''.join(new_lines)
    return new_query


def conn_sql_server(file_path, year=None, where_clause=None, before_clause_starts_with=None):
    '''Function to connect to SQL Server and save query result to a dataframe
        input:
            file_path - query file path
        output:
        df - dataframe from the query result
    '''

    # Connect to SQL Server
    conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
                      'Server= servername;'
                      'Database = databasename;'
                      'Trusted_Connection=yes;')

    # run query and ouput the result to df
    query = open(file_path, 'r')
    lines = query.readlines()
    query.close()

    new_query = modify_query(lines, year, where_clause, before_clause_starts_with)

    df = pd.read_sql_query(new_query, conn)
    return df   

df1 = conn_sql_server('C:/Users/JJ/SQL script1', 
          year=1999,
          where_clause='WHERE itemNumber = 1002345', 
          before_clause_starts_with='group by')

df2 = conn_sql_server('C:/Users/JJ/SQL script2')

df3 = conn_sql_server('C:/Users/JJ/SQL script3',
          year = 1500)

Simulation

Let's run an example.

script1.sql

DECLARE @Year INT = 2022;
SELECT YEAR(date) @Year, 
       SUM(list_price * quantity) gross_sales
FROM sales.orders o
     INNER JOIN sales.order_items i ON i.order_id = o.order_id
GROUP BY YEAR(date)
order by @Year

script2.sql

DECLARE @Year INT = 2022;
SELECT gross_sales
FROM sales.orders
order by @Year

script3.sql

DECLARE @Year INT = 2022;
SELECT GETDATE()

Using a script similar to the above, we'll try to see how each script looks like after it gets modified.

Simulation script

#import pyodbc
#import pandas as pd 

def modify_query(lines, year, where_clause, before_clause_starts_with):
    new_lines = []

    print('-------')
    print('ORIGINAL')
    print('-------')
    print(lines)

    for line in lines:

        if year is not None:
            if line.lower().startswith('declare @year int ='):
                new_lines.append(f"DECLARE @Year INT = {year}\n")
                continue

        if where_clause is not None:
            if line.lower().startswith(before_clause_starts_with.lower()):
                new_lines.append(where_clause + "\n")
                new_lines.append(line)
                continue

        new_lines.append(line)


    print('-------')
    print('NEW')
    print('-------')
    new_query = ''.join(new_lines)
    print(new_query)

    return new_query


def conn_sql_server(file_path, year=None, where_clause=None, before_clause_starts_with=None):
    '''Function to connect to SQL Server and save query result to a dataframe
        input:
            file_path - query file path
        output:
        df - dataframe from the query result
    '''

    # Connect to SQL Server
    #conn = pyodbc.connect('Driver= {SQL Server Native Client 11.0};'
    #                  'Server= servername;'
    #                  'Database = databasename;'
    #                  'Trusted_Connection=yes;')

    # run query and ouput the result to df
    query = open(file_path, 'r')
    lines = query.readlines()
    query.close()

    new_query = modify_query(lines, year, where_clause, before_clause_starts_with)

    #df = pd.read_sql_query(new_query, conn)
    #return df   

#df1 = conn_sql_server('C:/Users/JJ/SQL script1')
#df2 = conn_sql_server('C:/Users/JJ/SQL script2')
#df3 = conn_sql_server('C:/Users/JJ/SQL script3')

df1 = conn_sql_server('script1.sql', year=1999, where_clause='WHERE itemNumber = 1002345', before_clause_starts_with='group by')
df2 = conn_sql_server('script2.sql')
df3 = conn_sql_server('script3.sql', year=1500)

Original query 1 was like this in script1.sql

['DECLARE @Year INT = 2022;\n', 'SELECT YEAR(date) @Year, \n', '       SUM(list_price * quantity) gross_sales\n', 'FROM sales.orders o\n', '     INNER JOIN sales.order_items i ON i.order_id = o.order_id\n', 'GROUP BY YEAR(date)\n', 'order by @Year']

After running the script, the query will become

DECLARE @Year INT = 1999
SELECT YEAR(date) @Year, 
       SUM(list_price * quantity) gross_sales
FROM sales.orders o
     INNER JOIN sales.order_items i ON i.order_id = o.order_id
WHERE itemNumber = 1002345
GROUP BY YEAR(date)
order by @Year

Query 3 used to look like this:

['DECLARE @Year INT = 2022;\n', 'SELECT GETDATE()']

It becomes

DECLARE @Year INT = 1500
SELECT GETDATE()

Give it a shot by changing the python script as you deem fit.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 zedfoxus