'Delete rows having exactly same values
Please consider following table
Table Name: mytable
| model_id | event_name | time_of_event |
|---|---|---|
| 9 | CREATE | 2016-01-01 00:00:00 |
| 9 | UPDATE | 2016-01-01 01:00:00 |
| 9 | DELETE | 2016-01-01 02:00:00 |
| 3 | CREATE | 2016-01-01 03:00:00 DUPLICATE |
| 3 | CREATE | 2016-01-01 03:00:00 DUPLICATE delete this |
| 3 | DELETE | 2016-01-01 04:00:00 |
How to delete 5th entry from above table i.e. delete row with exactly same value from table. In above example no column is unique.
Please keep in mind that database could be huge and I don't want to recreate or republish data with distinct values into the table.
// Use below code to create above example
CREATE TABLE mytable(
model_id integer,
event_name varchar(7),
time_of_event timestamp
);
INSERT INTO mytable
(model_id, event_name, time_of_event)
VALUES
(9, 'CREATE', '2016-01-01 00:00:00'),
(9, 'UPDATE', '2016-01-01 01:00:00'),
(9, 'DELETE', '2016-01-01 02:00:00'),
(3, 'CREATE', '2016-01-01 03:00:00'),
(3, 'CREATE', '2016-01-01 03:00:00'),
(3, 'DELETE', '2016-01-01 04:00:00');
SELECT * FROM mytable;
Solution 1:[1]
Try with a helper table that contains the duplicates, but only one each:
With this scenario: ...
CREATE TABLE mytable(
model_id integer,event_name VARCHAR(8),time_of_event TIMESTAMP)
;
INSERT INTO mytable
-- your input data ...
SELECT 9,'CREATE',TIMESTAMP '2016-01-01 00:00:00'
UNION ALL SELECT 9,'UPDATE',TIMESTAMP '2016-01-01 01:00:00'
UNION ALL SELECT 9,'DELETE',TIMESTAMP '2016-01-01 02:00:00'
UNION ALL SELECT 3,'CREATE',TIMESTAMP '2016-01-01 03:00:00'
UNION ALL SELECT 3,'CREATE',TIMESTAMP '2016-01-01 03:00:00'
UNION ALL SELECT 3,'DELETE',TIMESTAMP '2016-01-01 04:00:00'
;
Create your helper table like so:
CREATE TABLE helper AS
SELECT
model_id
, event_name
, time_of_event
FROM mytable
GROUP BY
model_id
, event_name
, time_of_event
HAVING COUNT(*) > 1;
Then, use the helper table to delete ... you will delete all rows, not only one of the duplicates ...
DELETE FROM mytable
WHERE(model_id,event_name,time_of_event) IN (
SELECT model_id,event_name,time_of_event FROM helper
);
And finally, insert all the rows from the helper table back in again:
INSERT INTO mytable
SELECT * FROM helper;
COMMIT; -- if your connection is not auto-commit ...
But I'd like to add that, for most database systems, the other approach - to create a new table containing SELECT DISTINCT * FROM old_table is the faster alternative as soon as we are talking about around 20 to 25 % of the total row count.
Solution 2:[2]
Having two or more rows with identical values is a sign of very bad design. I suppose model_id is the table's primary key. I wonder how did you end up in this situation.
I don't want to recreate or republish data with distinct values into the table.
One possible solution is to add (not recreate/republish) a column with unique values to your table, then delete the duplicate rows you want.
ALTER TABLE mytable ADD COLUMN MyTableID INT FIRST;
You need to fill this column with unique values:
SET @i := 0;
UPDATE MyTable SET MyTableID = @i:=(@i+1) WHERE 1=1;
Next, you can write the following query:
SELECT
MT.MyTableID, MT.model_id, MT.event_name, MT.time_of_event
FROM
MyTable MT,
(SELECT model_id, event_name, time_of_event, COUNT(*)
FROM MyTable
GROUP BY model_id, event_name, time_of_event
HAVING COUNT(*) > 1
) TmpTable
WHERE
MT.model_id = TmpTable.model_id
AND MT.event_name = TmpTable.event_name
AND MT.time_of_event = TmpTable.time_of_event
;
Result:
| MyTableID | model_id | event_name | time_of_event |
|---|---|---|---|
| 4 | 3 | CREATE | 2016-01-01 03:00:00 |
| 5 | 3 | CREATE | 2016-01-01 03:00:00 |
You can now proceed with the deletion of duplicate rows:
DELETE FROM MyTable WHERE MyTableID IN (5 /*, the ones you wish */);
If you have too many duplicate values and you can't afford to delete them manually, you can do it like this:
DELETE FROM MyTable WHERE MyTableID IN (
SELECT
MT.MyTableID
FROM
(SELECT * FROM MyTable) AS MT,
(SELECT model_id, event_name, time_of_event, COUNT(*)
FROM MyTable
GROUP BY model_id, event_name, time_of_event
HAVING COUNT(*) > 1
) AS TmpTable
WHERE
MT.model_id = TmpTable.model_id
AND MT.event_name = TmpTable.event_name
AND MT.time_of_event = TmpTable.time_of_event
) LIMIT /* The number of duplicate rows - 1 */;
The -1 is to preserve one row of the duplicates. If you want to delete them all, remove the LIMIT clause.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
