'Cant find element in set (Psycopg, PostgreSQL)

This should be easy, so maybe I've fried my brain, but I can't get this to work. I have a database with over a million rows. I want to add about 15k more rows, but some of the data might already exist in the database.

My basic process is this:

async def process_data(object_list):
    # Load pre-existing data from db:
    pks = await copy_pks()
    # Remove duplicates:
    filtered_data = [obj for obj in object_list if obj.id not in pks]
    # Add unique values:
    await copy_in(filtered_data)

My copy_pks method:

@with_connection:
async def copy_pks(self, connection):
    pks = set()
    async with connection.cursor().copy(sql) as cursor:
        async for row in cursor:
            res = await cursor.read_row()
            pks.add(int(res[0]))
    return pks

My copy_in method:

@with_connection
async def copy_in(self, connection, data):
    async with connection.cursor().copy(sql) as cursor:
        for obj in data:
            await cursor.write_row(data)

The problem is that I keep hitting duplicate key value violates unique constraint despite my attempt to filter. I've checked the length of object_list and the length of filtered_data and the latter is about half the size... so clearly something is getting filtered.

I then checked specifically for the offending object:

for obj in object_list:
    if obj.id == 14856714938:
        print("Found")
        if obj.id in pks:
            print("In pks")
        if obj.id in filtered_data:
            print("In filtered")

I receive "Found", but I don't receive "In pks" and I don't receive "In filtered."

Unless I'm blind to some syntax or logic error here, my best shot in the dark is that it's a RAM issue, given the size of the data?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source