'Insert multiple columns to another table based on 1 distinct column [duplicate]
Below is the structure of my table diary, which has some entries about trips. The problem is there are around 4,000 duplicate entries which I found after hashing the notes column into hash column. Total entries are around 47,000.
I want to delete all the duplicate rows leaving just 1 row based on the hash column
I can find the duplicate rows using
SELECT `url`, COUNT(hash) c from diary GROUP BY `hash` HAVING c > 1 ORDER BY `c` DESC;
and then use this SQL to find the url id of the duplicate rows
SELECT `url`, places, days,`hash` FROM diary WHERE `hash` = (select `hash` from diary WHERE `url` = ?);
For this table I want to have any 1 row from the first 2 rows as they have same hash and 3rd, 4th row. Also to be noted in the diary table there may be duplicate hash with count as high as 400.
Ans: Add hash as unique key then INSERT IGNORE INTO diary2 (c1, c2, c3) SELECT cl, c2 FROM diary;
Solution 1:[1]
I am not sure about what you mean by "duplicates" exactly, but to answer the last part of your question:
For this table I want to have any 1 row from the first 2 rows as they have same hash and 3rd, 4th row. Also to be noted in the diary table there may be duplicate hash with count as high as 400.
you can use the row_number function for this:
with cte as (
select `url`, places, days,`hash`,
rownumber() over (partition by hash) rown
from diary where `hash` = (select `hash` from diary WHERE `url` = ?)
)
select * from cte where rown = 1;
Partitioning by hash, means that the rown column will have 1 as value for the first line with hash=1 and 2 as value for the second line. And by specifying that you want to select rows with rown=1, you will get the first line for hash=1 (and the single line for hash=3 and hash=4.
Solution 2:[2]
Here is some example to delete a duplicate row.
Sample Table Data :
url days places itenerary flag remarks hash
------ ------ ------ --------- ------ ------- --------
1 1 1 1 0 (NULL) 1
2 2 2 2 0 (NULL) 1
3 3 3 3 0 (NULL) 3
4 4 4 4 0 (NULL) 4
5 5 5 5 0 (NULL) 4
DELETE QUERY
Delete duplicate using this query according to duplicate hash:
DELETE t1 FROM diary t1 INNER JOIN diary t2
WHERE t1.`url`>t2.`url` AND `t1`.`hash`=`t2`.`hash`;
Result :
1 queries executed, 1 success, 0 errors, 0 warnings
Query: delete t1 from diary t1 inner join diary t2 where t1.`url`>t2.`url` and `t1`.`hash`=`t2`.`hash`
2 row(s) affected
url days places itenerary flag remarks hash
------ ------ ------ --------- ------ ------- --------
1 1 1 1 0 (NULL) 1
3 3 3 3 0 (NULL) 3
4 4 4 4 0 (NULL) 4
This will delete any duplicates and leave 1 copy with the lowest url number per duplicate.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | romborimba |
| Solution 2 |


