'How do I remove duplicate lines using Sed without sorting?
I've been trying to figure out how to delete duplicate lines using only Sed and I'm having trouble figuring out how to do it.
So far I've tried this and it hasn't worked.
sed '$!N; /^\(.*\)\n\1$/!P; D'
file:
APPLE
ORANGES
BANANA
BANANA
COOKIES
FRUITS
What I got:
APPLE
ORANGES
BANANA
BANANA
COOKIES
FRUITS
What I want:
APPLE
ORANGES
BANANA
COOKIES
FRUITS
I've been trying to figure out how to do it so I won't have to manually go through each line in a file and tell it to manually delete the duplicates.
My goal is for this to eventually delete the second instance of BANANA.
Can anyone point me in the right direction?
Thanks
Solution 1:[1]
Using sed
$ sed -n '/^$/d;G;/^\(.*\n\).*\n\1$/d;H;P;a\ ' input_file
APPLE
ORANGES
BANANA
COOKIES
FRUITS
Remove blank lines. Append hold space. If the line is duplicated, delete it, else copy into hold space, print and insert blank lines.
Solution 2:[2]
mmm that is odd, that seems to work for me. Is it because you have an empty line in between each text-line ?
~$ cat test.txt
APPLES
ORANAGES
BANANA
BANANA
COOKIES
FRUITS
~$ cat test.txt | sed '$!N; /^\(.*\)\n\1$/!P; D'
APPLES
ORANAGES
BANANA
COOKIES
FRUITS
Solution 3:[3]
This might work for you (GNU sed):
sed -E '1s/^/\n/;:a;N;s/((\n\S+)(\n\S+)*)\n\2$/\1/;$!ba;s/.//' file
On the first line, insert a newline for regexp purposes.
Gather up the lines in the pattern space, removing duplicates when added (plus the empty line beforehand).
At end of the file, remove the introduced newline and print the result.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | HatLess |
| Solution 2 | clogwog |
| Solution 3 | potong |
