'how to insert multiple python parameter into sql sting to create a dataframe
I've been trying to create a dataframe based on a SQL string, but I want to query the same things (3 counts) over different periods (in this example, monthly).
For context, I've been working in a python notebook on Civis.
I came up with this :
START_DATE_LIST = ["2021-01-01","2021-02-01","2021-03-01","2021-04-01","2021-05-01","2021-06-01","2021-07-01","2021-08-01","2021-09-01","2021-10-01","2021-11-01","2021-12-01"]
END_DATE_LIST = ["2021-01-31","2021-02-28","2021-03-31","2021-04-30","2021-05-31","2021-06-30","2021-07-31","2021-08-31","2021-09-31","2021-10-31","2021-11-30","2021-12-31"]
for start_date, end_date in zip(START_DATE_LIST,END_DATE_LIST) :
SQL = f"select count (distinct case when (CON.firstdonationdate__c< {start_date} and CON.last_gift_date__c> {start_date} ) then CON.id else null end) as Donors_START_DATE, \
count (distinct case when (CON.firstdonationdate__c< {end_date} and CON.c_last_gift_date__c> {end_date}) then CON.id else null end) as Donors_END_DATE, \
count (distinct case when (CON.firstdonationdate__c> {start_date} and CON.firstdonationdate__c<{end_date}) then CON.id else null end) as New_Donors \
from staging.contact CON;"
df2 =civis.io.read_civis_sql(SQL, "database", use_pandas=True)
df2['START_DATE']=start_date
df2['END_DATE']= end_date
It runs but then the output is only :
donors_start_date donors_end_date new_donors START_DATE END_DATE
0 47458 0 0 2021-12-01 2021-12-31
I'm thinking I have two problems : 1/ it reruns the df each time and, I need to find a way to stack up the outputs for each month. 2/ why doesn't it compute the last two counts for the last month.
Any feedback is greatly appreciated!
Solution 1:[1]
I think you have correctly identified the problem yourself:
- In each iteration, you perform an SQL query and assign the result to
DataFrameobject calleddf2(thus overriding its previous value) - Instead, you want create a
DataFrameobject outside the loop, then append data to it:
import pandas as pd
START_DATE_LIST = ...
END_DATE_LIST = ...
df = pd.DataFrame()
for start_date, end_date in zip(START_DATE_LIST, END_DATE_LIST) :
SQL = ...
row = civis.io.read_civis_sql(SQL, "database", use_pandas=True)
row['START_DATE'] = start_date
row['END_DATE'] = end_date
df = df.append(row)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Anton Yang-Wälder |
