'Extract text from file in Linux: specific line; between 2 different patterns

I have a bunch of text files, all with the same structure, and I need to extract a specific piece in a specific line.

I can easily extract the line with awk:

awk 'NR==23' blast_out.txt

CP046310.1 Lactobacillus jensenii strain FDAARGOS_749 chromosome,...  787     0.0 

But I don't want the whole line, rather just the part between the first space on the left (after CP046310.1) and the double space on the right (before 787). The final output should be:

Lactobacillus jensenii strain FDAARGOS_749 chromosome,...

I tried several combination of awk and grep but cannot find the correct one to extract this specific pattern.



Solution 1:[1]

Using sed you can use this solution:

sed -En '23s/^[^ ]+ |  .*$//gp' file

Lactobacillus jensenii strain FDAARGOS_749 chromosome,...

Or using awk:

 awk 'NR == 23 {gsub(/^[^ ]+ |  .*$/, ""); print}' file

Solution 2:[2]

If I get what you ask, you want to extract the fields from the second (included) to the second-last (excluded). I would go with:

awk ' FNR==23 {for (i = 2; i < NF - 2; i++) { printf("%s ", $i) }; printf("%s\n", $i); exit }' file_path

An example with the line you posted:

$ echo "CP046310.1 Lactobacillus jensenii strain FDAARGOS_749 chromosome,...  787     0.0" | awk '{for (i = 2; i < NF - 2; i++) { printf("%s ", $i) }; printf("%s\n", $i); exit }'
$ Lactobacillus jensenii strain FDAARGOS_749 chromosome,... 

I assume that chromosome,... does not contains spaces and you have only single spaces separating the fields you want to extract. If the second condition is not true, those extra spaces are removed.

Solution 3:[3]

With Perl:

echo "CP046310.1 Lactobacillus jensenii strain FDAARGOS_749 chromosome,...  787     0.0"|perl -ne 'm/ (.*?)  /; print $1'

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 anubhava
Solution 2
Solution 3 Supertech