'Conversion to blob for many files in a directory into DB2
I'm rather new to this. I need to convert all files (.csv and xlsx) i.e Supermarket.xlsx, sales.csv, marketing xlsx which are uploaded into a directory and convert it into a blob data into DB2, table name SB_DATA_BLOB_TEST with field names "data_column", "ingestion_date_time", "ingestion_file_name", "row_id".
I have only managed to insert 1 file, and state the timestamp, filename and row_id, but how can I apply the same function to a list of files uploaded to that directory, and apply the timestamp, list the filenames and the row_id accordingly without inserting this row_id manually as well?
The code:
import os
import pandas as pd
from subprocess import Popen, PIPE, run
import jaydebeapi
from project_lib import Project
constants = {
'INPUT_DIR': '/project_data/data_asset/'
}
file_names = {
'Supermart': 'Supermart.xlsx'
}
schema_name = 'ABC.'
table_prefix = 'SB_'
timestamp = pd.Timestamp.now("Asia/Singapore").strftime("%Y%m%d %H%M%S")
file = constants['INPUT_DIR'] + file_names['Supermart'] ## data
filename = constants['INPUT_DIR'] + file_names['Supermart'] ## ingestion_file_name
def convertToBinaryData(filename):
# Convert digital data to binary format
with open(filename, 'rb') as file:
binaryData = file.read()
return binaryData
def insertBLOB(data, ingestion_datetime, ingestion_filename, row_id):
print("Inserting BLOB into ABC SB_Data_Blob table")
try:
project = Project.access()
abc_sb_credentials = project.get_connection(name="abc_sb")
print(abc_sb_credentials)
abc_sb_connection = jaydebeapi.connect('com.ibm.db2.jcc.DB2Driver',
'{}://{}:{}/{}:user={};password={};'.format('jdbc:db2',
abc_sb_credentials['host'],
abc_sb_credentials['port'],
abc_sb_credentials['database'],
abc_sb_credentials['username'],
abc_sb_credentials['password']))
curs = abc_sb_connection.cursor()
sql_insert_blob_query = """ INSERT INTO ABC.SB_DATA_BLOB_TEST
(data_column, ingestion_date_time, ingestion_file_name, row_id) VALUES (?,?,?,?)"""
file = convertToBinaryData(data)
# Convert data into tuple format
insert_blob_tuple = (jaydebeapi.Binary(file), ingestion_datetime, ingestion_filename, row_id)
result = curs.execute(sql_insert_blob_query, insert_blob_tuple)
abc_sb_connection.commit()
print("File is inserted successfully as a BLOB into SB_DATA_BLOB table", result)
except Exception as error:
print(f"{error}")
print("Failed inserting BLOB data into DB2 table SB_DATA_BLOB".format(error))
finally:
## if abc_sb_connection.is_connected():
curs.close()
abc_sb_connection.close()
print("DB2 connection is closed")
insertBLOB(file, timestamp, filename, '2')
Solution 1:[1]
I need to convert all files (.csv and xlsx) i.e Supermarket.xlsx, sales.csv, marketing xlsx which are uploaded into a directory
List all those files in your directory.
Suppose the directory is given by constants['INPUT_DIR'] and the filenames should be filtered by extension .xlsx and .csv, then use Pythons module glob and its method glob.
import glob
# list of filenames by given extension in given directory
csv_filenames = glob.glob(constants['INPUT_DIR'] + '*.csv')
xlsx_filenames = glob.glob(constants['INPUT_DIR'] + '*.xlsx')
all_filenames = csv_filenames + xlsx_filenames
row_id = 1 # initial row id to start inserting, will be incremented for each file
# for each of those files insert the blob, for example
for filename in all_filenames:
path = constants['INPUT_DIR'] + filename
ingestion_datetime = pd.Timestamp.now("Asia/Singapore").strftime("%Y%m%d %H%M%S") # or timestamp of the file
ingestion_filename = filename # only the filename, no directory
row_id += 1 # increase by one
insertBLOB(path, ingestion_datetime, ingestion_filename, row_id):
Supposes your directory contains:
Supermart.xlsxSupermart.csv
then the list will consist only the names like: ['Supermart.xlsx', 'Supermart.csv'].
See also:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | hc_dev |
