'Delete rows having exactly same values

Please consider following table

Table Name: mytable

model_id event_name time_of_event
9 CREATE 2016-01-01 00:00:00
9 UPDATE 2016-01-01 01:00:00
9 DELETE 2016-01-01 02:00:00
3 CREATE 2016-01-01 03:00:00       DUPLICATE
3 CREATE 2016-01-01 03:00:00       DUPLICATE delete this
3 DELETE 2016-01-01 04:00:00

How to delete 5th entry from above table i.e. delete row with exactly same value from table. In above example no column is unique.

Please keep in mind that database could be huge and I don't want to recreate or republish data with distinct values into the table.

// Use below code to create above example 

CREATE TABLE mytable(
  model_id  integer,
  event_name varchar(7),
  time_of_event timestamp
);

INSERT INTO mytable 
    (model_id, event_name, time_of_event) 
    VALUES
    (9, 'CREATE', '2016-01-01 00:00:00'),
    (9, 'UPDATE', '2016-01-01 01:00:00'),
    (9, 'DELETE', '2016-01-01 02:00:00'), 
    (3, 'CREATE', '2016-01-01 03:00:00'),
    (3, 'CREATE', '2016-01-01 03:00:00'),
    (3, 'DELETE', '2016-01-01 04:00:00');
    
SELECT * FROM mytable;


Solution 1:[1]

Try with a helper table that contains the duplicates, but only one each:

With this scenario: ...

CREATE TABLE mytable(
model_id integer,event_name VARCHAR(8),time_of_event TIMESTAMP)
; 

INSERT INTO mytable
-- your input data ...
          SELECT 9,'CREATE',TIMESTAMP '2016-01-01 00:00:00'
UNION ALL SELECT 9,'UPDATE',TIMESTAMP '2016-01-01 01:00:00'
UNION ALL SELECT 9,'DELETE',TIMESTAMP '2016-01-01 02:00:00'
UNION ALL SELECT 3,'CREATE',TIMESTAMP '2016-01-01 03:00:00'
UNION ALL SELECT 3,'CREATE',TIMESTAMP '2016-01-01 03:00:00'
UNION ALL SELECT 3,'DELETE',TIMESTAMP '2016-01-01 04:00:00'
;

Create your helper table like so:

CREATE TABLE helper AS
SELECT
  model_id
, event_name
, time_of_event
FROM mytable
GROUP BY 
  model_id
, event_name
, time_of_event
HAVING COUNT(*) > 1;

Then, use the helper table to delete ... you will delete all rows, not only one of the duplicates ...

DELETE FROM mytable
WHERE(model_id,event_name,time_of_event) IN (
SELECT model_id,event_name,time_of_event FROM helper
);

And finally, insert all the rows from the helper table back in again:

INSERT INTO mytable
SELECT * FROM helper;                                                                                                                                                                              
COMMIT; -- if your connection is not auto-commit ...

But I'd like to add that, for most database systems, the other approach - to create a new table containing SELECT DISTINCT * FROM old_table is the faster alternative as soon as we are talking about around 20 to 25 % of the total row count.

Solution 2:[2]

Having two or more rows with identical values is a sign of very bad design. I suppose model_id is the table's primary key. I wonder how did you end up in this situation.

I don't want to recreate or republish data with distinct values into the table.

One possible solution is to add (not recreate/republish) a column with unique values to your table, then delete the duplicate rows you want.

ALTER TABLE mytable ADD COLUMN MyTableID INT FIRST;

You need to fill this column with unique values:

SET @i := 0;
UPDATE MyTable SET MyTableID = @i:=(@i+1) WHERE 1=1;

Next, you can write the following query:

SELECT
  MT.MyTableID, MT.model_id, MT.event_name, MT.time_of_event
FROM
  MyTable MT,
  (SELECT model_id, event_name, time_of_event, COUNT(*)
   FROM MyTable
   GROUP BY model_id, event_name, time_of_event
   HAVING COUNT(*) > 1
  ) TmpTable
WHERE 
  MT.model_id = TmpTable.model_id
  AND MT.event_name = TmpTable.event_name
  AND MT.time_of_event = TmpTable.time_of_event
;

Result:

MyTableID model_id event_name time_of_event
4 3 CREATE 2016-01-01 03:00:00
5 3 CREATE 2016-01-01 03:00:00

You can now proceed with the deletion of duplicate rows:

DELETE FROM MyTable WHERE MyTableID IN (5 /*, the ones you wish */);

If you have too many duplicate values and you can't afford to delete them manually, you can do it like this:

DELETE FROM MyTable WHERE MyTableID IN (
  SELECT
    MT.MyTableID
  FROM
    (SELECT * FROM MyTable) AS MT,
    (SELECT model_id, event_name, time_of_event, COUNT(*)
     FROM MyTable
     GROUP BY model_id, event_name, time_of_event
     HAVING COUNT(*) > 1
    ) AS TmpTable
  WHERE 
    MT.model_id = TmpTable.model_id
    AND MT.event_name = TmpTable.event_name
    AND MT.time_of_event = TmpTable.time_of_event
) LIMIT /* The number of duplicate rows - 1 */;

The -1 is to preserve one row of the duplicates. If you want to delete them all, remove the LIMIT clause.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2