'Remove duplicate from txt file that contains different sentences but consist of the same words on PHP

I want to remove duplicates from txt file. Now, I use this to remove duplicates:

$lines = file('input.txt');
$lines = array_unique($lines);
file_put_contents('output.txt', implode($lines));

The problem is that code only remove duplicate for a case like beef bbq recipe and beef bbq recipe only. In my case, if the txt file contains keywords like :

beef bbq recipe
beef easy recipe
beef steak recipe
bbq recipe beef
beef bbq recipe
recipe bbq beef

Will return with this result :

beef bbq recipe
beef easy recipe
beef steak recipe
bbq recipe beef
recipe bbq beef

Instead, I want the result looks like this :

beef bbq recipe
beef easy recipe
beef steak recipe

So, I want cases like beef bbq recipe, bbq recipe beef and recipe bbq beef to be considered as duplicates too. Is there a solution for this? Thank you



Solution 1:[1]

You can use array_map, explode and sort to bring the keywords into the same order for all your lines before removing duplicates:

$lines = file('input.txt');

// sort keywords in each line
$lines = array_map(function($line) {
    $keywords = explode(" ", trim($line));
    sort($keywords);
    return implode(" ", $keywords);
}, $lines);

$lines = array_unique($lines);
file_put_contents('output.txt', implode("\n", $lines));

This will iterate your array and order the keywords for each line alphabetically. Afterwards, you can remove the duplicated lines using array_unique.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1