'Look for and transfer new data between two tables using a timestamp id
What the following script is supposed to do,
- Connect to postgreSQL database
- Grab last id entry in the database final table
- Compare that entry to data uploaded into a staging table from a .csv using the [id] column (trying to avoid duplicates)
- Insert data from the staging table to the final table (only entries where timestamp id is greater than last entry from previous data)
- Truncate staging table
The code as written below works, but is unfinished. I am to the point where I have to compare the timestamp1 or t1 to the id column in the staging table. I'm unsure of how to go about that though.
This spot in the code,
#insert new entries into final db table
cursor.execute("INSERT INTO test SELECT * FROM stagingtable WHERE ####
I am hoping for a bit of assistance or guidance with what needs to be done. My python skills are new and it has taken me a good deal to get this far in. I'm sure a for loop is required, but I'm not sure how to incorporate the timetable format to the id column "%Y/%m/%d %H:%M:%S.%f". When applied correctly, the difference for new entries to the timestamp id should be positive and entries that already exist, either zero or negative. Some may suggest that a Merge Into would work, but at the moment the final table will continually collect data without truncating any earlier uploads. So it'll eventually take longer and longer to compare data using the Merge method (to my understanding).
import csv
import pyodbc
import time
from datetime import datetime
#connect to database
#DB connection string
print("Establishing Database connection...")
con = pyodbc.connect('DSN=sqldatabase')
cursor = con.cursor()
print("...Connected to database.")
#recall last timestamp entry in final db table
timestamp1 = cursor.execute("select max(id) from test;").fetchval()
#read file and copy data into staging table
print("Reading file contents and copying into staging table...")
with open('C:\\Users\\user\\Desktop\\test2.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
columns = next(readCSV) #skips the header row
query = 'insert into stagingtable({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
for data in readCSV:
cursor.execute(query, data)
con.commit()
timestamp2 = cursor.execute("select max(id) from stagingtable;").fetchval()
t1 = datetime.strptime(timestamp1, "%Y/%m/%d %H:%M:%S.%f")
# t2 = datetime.strptime(timestamp2, "%Y/%m/%d %H:%M:%S.%f")
# difference = t2 - t1
# print(difference)
#insert new entries into final db table
cursor.execute("INSERT INTO test SELECT * FROM stagingtable WHERE ####
#clear staging table
print("Clearing previous data download...")
cursor.execute("TRUNCATE TABLE stagingtable")
con.commit()
con.close()
print("...Completed clearing staging table.")
Solution 1:[1]
import csv
import pyodbc
import time
from datetime import datetime
#connect to database
#DB connection string
print("Establishing Database connection...")
con = pyodbc.connect('DSN=SQLdatabase')
cursor = con.cursor()
print("...Connected to database.")
#recall last timestamp entry in db table
t1 = datetime.strptime(cursor.execute("SELECT MAX(id) FROM test;").fetchval(), "%Y/%m/%d %H:%M:%S.%f")
#read file and copy data into table
print("Reading file contents and copying into table...")
with open('C:\\Users\\user\\Desktop\\test2.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
columns = next(readCSV) #skips the header row
t2 = datetime.strptime(next(readCSV)[0], "%Y/%m/%d %H:%M:%S.%f")
while t2 < t1:
t2 = datetime.strptime(next(readCSV)[0], "%Y/%m/%d %H:%M:%S.%f")
query = 'insert into test({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
for data in readCSV:
cursor.execute(query, data)
con.commit()
print("Data posted to table")
I did away with the staging table. This was the final outcome.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | BabaZuri |
