'Duplicate entries in file
I have a file with content as below,
123 ABC
12345 ABC-test
In the shell script, I need an exact entry instead of two duplicate results, but unable to get the exact entry.
For example:
grep "ABC"
returns both the entries, but I want a specific entry, i.e., if I search for "ABC", I should get only "123 ABC" and not the other entry.
Solution 1:[1]
Since you consider words to be whitespace-separated chunks, it is easier to use awk here since it reads lines (records) and splits them into fields (non-whitespace chunks) by default:
awk '$2=="ABC"' file > newfile
awk '/([[:space:]]|^)ABC([[:space:]]|$)/' file > newfile
Here, the first awk will output all lines where the second word is ABC. The second awk outputs all lines with ABC followed/preceded with a whitespace or at start/end of the line.
See the online demo:
#!/bin/bash
s='123 ABC
12345 ABC-test'
awk '$2=="ABC"' <<< "$s"
awk '/([[:space:]]|^)ABC([[:space:]]|$)/' <<< "$s"
Output:
123 ABC
Solution 2:[2]
You have to forge proper regex (regular expression) - in this case you want only those lines, where ABC is not surrounded by other characters (is on boundaries):
grep -e '\bABC\b'
should do the work. -e switch enables extended regular expressions in grep. Check also some regex tutorials, i.e. https://www.regular-expressions.info/tutorial.html.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wiktor Stribiżew |
| Solution 2 | Maciej Wrobel |
