'return matching but not exact same strings
Is there any way to find a word that contains a given string but is not the exact match. For e.g.
# cat t.txt
first line
ind is a shortform of india
I am trying to return the word "india" because it contains the string "ind" but I do not need the exact match. I have tried this...
# grep -o 'ind' t.txt
ind
ind
Solution 1:[1]
Would you please try the following:
grep -Eo '[A-Za-z]+ind|ind[A-Za-z]+' t.txt
Output:
india
The regex [A-Za-z]+ind|ind[A-Za-z]+ matches ind including the preceding or following alphabets.
Solution 2:[2]
$ grep -Eo '[[:alpha:]]+ind[[:alpha:]]*|[[:alpha:]]*ind[[:alpha:]]+' file
india
fooindbar
the above was run on this input file (note the added test case of ind appearing in the middle of a string instead of just the start or end):
$ cat file
first line
ind is a shortform of india
this fooindbar is the mid-word text
You can do the same with GNU awk (for multi-char RS, RT, and \s shorthand for [[:space:]]) if you prefer:
$ awk -v RS='\\s+' '/[[:alpha:]]+ind[[:alpha:]]*|[[:alpha:]]*ind[[:alpha:]]+/' file
india
fooindbar
or:
$ awk -v RS='[[:alpha:]]+ind[[:alpha:]]*|[[:alpha:]]*ind[[:alpha:]]+' 'RT{print RT}' file
india
fooindbar
Solution 3:[3]
I would use GNU AWK for this task following way, let file.txt content be
first line
ind is a shortform of india
then
awk 'BEGIN{RS="[[:space:]]+"}match($0,/ind/)&&length>RLENGTH{print}' file.txt
output
india
Explanation: I inform GNU AWK that row separator (RS) is one or more whitespaces, this way every word will be treated as row. Then for every row (that is every word) I use match function which return 1 if found else 0 and set RSTART and RLENGTH values. If match is found I check if length of current row (that is word) is greater than that of match, if it is so I print said word. Note that every word is outputted at own line so for example if input file content would be
india ind india ind india
then output would be
india
india
india
(tested in gawk 4.2.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | tshiono |
| Solution 2 | |
| Solution 3 | Daweo |
