'how to insert multiple python parameter into sql sting to create a dataframe

I've been trying to create a dataframe based on a SQL string, but I want to query the same things (3 counts) over different periods (in this example, monthly).

For context, I've been working in a python notebook on Civis.

I came up with this :

START_DATE_LIST = ["2021-01-01","2021-02-01","2021-03-01","2021-04-01","2021-05-01","2021-06-01","2021-07-01","2021-08-01","2021-09-01","2021-10-01","2021-11-01","2021-12-01"]
END_DATE_LIST = ["2021-01-31","2021-02-28","2021-03-31","2021-04-30","2021-05-31","2021-06-30","2021-07-31","2021-08-31","2021-09-31","2021-10-31","2021-11-30","2021-12-31"]

for start_date, end_date in zip(START_DATE_LIST,END_DATE_LIST) :
    SQL = f"select count (distinct case when (CON.firstdonationdate__c< {start_date} and CON.last_gift_date__c>  {start_date} ) then CON.id else null end) as Donors_START_DATE, \
count (distinct case when (CON.firstdonationdate__c< {end_date} and CON.c_last_gift_date__c> {end_date}) then CON.id else null end) as Donors_END_DATE, \
count (distinct case when (CON.firstdonationdate__c> {start_date} and CON.firstdonationdate__c<{end_date}) then CON.id else null end) as New_Donors \
from staging.contact CON;"
    df2 =civis.io.read_civis_sql(SQL, "database", use_pandas=True)
    df2['START_DATE']=start_date
    df2['END_DATE']= end_date

It runs but then the output is only :

 donors_start_date  donors_end_date new_donors  START_DATE  END_DATE
0      47458                  0           0       2021-12-01     2021-12-31

I'm thinking I have two problems : 1/ it reruns the df each time and, I need to find a way to stack up the outputs for each month. 2/ why doesn't it compute the last two counts for the last month.

Any feedback is greatly appreciated!



Solution 1:[1]

I think you have correctly identified the problem yourself:

  • In each iteration, you perform an SQL query and assign the result to DataFrame object called df2 (thus overriding its previous value)
  • Instead, you want create a DataFrame object outside the loop, then append data to it:
import pandas as pd

START_DATE_LIST = ...
END_DATE_LIST = ...


df = pd.DataFrame()

for start_date, end_date in zip(START_DATE_LIST, END_DATE_LIST) :
    SQL = ...
    row = civis.io.read_civis_sql(SQL, "database", use_pandas=True)
    row['START_DATE'] = start_date
    row['END_DATE'] = end_date

    df = df.append(row)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Anton Yang-Wälder