'Querying DB in Python3 throws ascii codec can't decode byte error
I am using running python 3.8.6 (on aws lambda) and connecting to a postgres databse using psycopg2. The use case is to connect to the DB, run a query and save this query to a csv file. The databse is quite big and there seems to be non ascii character in the database that is causing some issues when I have processed about 3/4 of the files as I am getting the following error message:
"errorMessage": "'ascii' codec can't decode byte 0x93 in position 0: ordinal not in range(128)",
"errorType": "UnicodeDecodeError",
Based on the errors I have tried reviewing the docs to help mitigate the issue and taken on some of the suggestions, however I am still encountering the issue. The code is quite lengthy, however the latest version including some previously followed suggestions for the the database parts are as follows:
import psycopg2
import psycopg2.extensions
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
.
.
try:
conn = psycopg2.connect(host=db_host, user=db_username, password=db_password, database=db_name)
.
.
def run_sql():
with conn.cursor() as cur:
cur.execute(open(sql_file_location, 'r', encoding="utf-8", errors="ignore").read())
with open(csv_filename_location, mode='w', encoding="utf-8") as csv_file:
csv_writer = csv.writer(csv_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for row in cur:
csv_writer.writerow(list(row))
conn.commit()
The error occurs when looping through the returned results:
line 167, in run_sql\n for row in cur:\n"
Some further information that I noted is that the Lambda is running utf-8 and the database returns SQLASCII when running conn.encoding and returns SQL_ASCII when running conn.cursor()fetchone()[0]
I think the symbol in the database may be a '€' symbol or something similar so if anyone has any suggestions on changes to make to account for this I would be very appreciative.
Solution 1:[1]
To those in the future stuck on the same issue I had, the Database I was connecting to was in fact using SQL_ASCII, I had missed one step when declaring my connection in order to use utf-8 in my environment. Below is the one line added after the connection is made:
try:
conn = psycopg2.connect(host=db_host, user=db_username, password=db_password, database=db_name)
conn.set_client_encoding("utf-8")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | pyzor |
