'Merge duplicate table rows than delete duplicates without timeout

I have the following (simplified) table structure:

table_user_book
id | book_id | user_id | page 
---+---------+---------+------
 1 | b1      | c1      | 16
 2 | b1      | c1      | 130
 3 | b2      | c1      | 54
 4 | b4      | c2      | 97
 5 | b5      | c2      | 625
 6 | b5      | p3      | 154

as you can see I have duplicates. I want a single entry for the pair user/book. page is the last page read by the user. I have the following query that allows me to have the right merged data but I now want to update the table and delete the wrong records.

select 
    min("id") as "id", 
    max("page") as "page",      
    "book_id",
    "user_id",
    count(*) as "duplicates"
from myschema."table_user_book"
group by "book_id", "user_id"
order by count(*) DESC

I've run an update query using the above query as source matchng on "id".
I've then tried with the classic delete query WHERE a.id < b.id AND ... to delete the wrong rows but my DB goes to timeout.

I dont have the exact delete query I used but it was quite similar to:

delete FROM
    myschema."table_user_book" a
using myschema."table_user_book" b
where
   a.id > b.id
   and a."book_id" = b."book_id"
   and a."user_id"= b."user_id";

I've also tried to use sets to delete the rows but that is a even worse solution.

delete FROM
    myschema."table_user_book"
where "id" not in ( ... above filter query with only ids selected ... )

I have around 400k rows in that table and 10-15% of them are duplicates!

For now I'm creating a tmp table with the right values, dropping the original one, and recreating it:

create table myschema.tmp as ( ... query above ... )
drop table myschema.table_user_book
create table myschema.table_user_book as ( select * from myschema."tmp" )


Solution 1:[1]

For deleting only duplicated records you have two variants:

  1. Creating a recursive function for detecting and deleting duplicates
  2. Using CTE commands (with as) by materialized technology and delete without recursive, by one-time execute the query.

For sample query please visit this url

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ramin Faracov