'How do I remove duplicate lines using Sed without sorting?

I've been trying to figure out how to delete duplicate lines using only Sed and I'm having trouble figuring out how to do it.

So far I've tried this and it hasn't worked.

sed '$!N; /^\(.*\)\n\1$/!P; D'

file:

APPLE

ORANGES

BANANA

BANANA

COOKIES

FRUITS

What I got:

APPLE

ORANGES

BANANA

BANANA

COOKIES

FRUITS

What I want:

APPLE

ORANGES

BANANA

COOKIES

FRUITS

I've been trying to figure out how to do it so I won't have to manually go through each line in a file and tell it to manually delete the duplicates.

My goal is for this to eventually delete the second instance of BANANA.

Can anyone point me in the right direction?

Thanks

linux sed

Solution 1:^[1]

Using sed

$ sed -n '/^$/d;G;/^\(.*\n\).*\n\1$/d;H;P;a\ ' input_file
APPLE

ORANGES

BANANA

COOKIES

FRUITS

Remove blank lines. Append hold space. If the line is duplicated, delete it, else copy into hold space, print and insert blank lines.

mmm that is odd, that seems to work for me. Is it because you have an empty line in between each text-line ?

~$ cat test.txt
APPLES
ORANAGES
BANANA
BANANA
COOKIES
FRUITS

~$ cat test.txt |  sed '$!N; /^\(.*\)\n\1$/!P; D'
APPLES
ORANAGES
BANANA
COOKIES
FRUITS

This might work for you (GNU sed):

   sed -E '1s/^/\n/;:a;N;s/((\n\S+)(\n\S+)*)\n\2$/\1/;$!ba;s/.//' file

On the first line, insert a newline for regexp purposes.

Gather up the lines in the pattern space, removing duplicates when added (plus the empty line beforehand).

At end of the file, remove the introduced newline and print the result.

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.