'Remove duplicate from txt file that contains different sentences but consist of the same words on PHP
I want to remove duplicates from txt file. Now, I use this to remove duplicates:
$lines = file('input.txt');
$lines = array_unique($lines);
file_put_contents('output.txt', implode($lines));
The problem is that code only remove duplicate for a case like beef bbq recipe and beef bbq recipe only. In my case, if the txt file contains keywords like :
beef bbq recipe
beef easy recipe
beef steak recipe
bbq recipe beef
beef bbq recipe
recipe bbq beef
Will return with this result :
beef bbq recipe
beef easy recipe
beef steak recipe
bbq recipe beef
recipe bbq beef
Instead, I want the result looks like this :
beef bbq recipe
beef easy recipe
beef steak recipe
So, I want cases like beef bbq recipe, bbq recipe beef and recipe bbq beef to be considered as duplicates too. Is there a solution for this? Thank you
Solution 1:[1]
You can use array_map, explode and sort to bring the keywords into the same order for all your lines before removing duplicates:
$lines = file('input.txt');
// sort keywords in each line
$lines = array_map(function($line) {
$keywords = explode(" ", trim($line));
sort($keywords);
return implode(" ", $keywords);
}, $lines);
$lines = array_unique($lines);
file_put_contents('output.txt', implode("\n", $lines));
This will iterate your array and order the keywords for each line alphabetically. Afterwards, you can remove the duplicated lines using array_unique.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
