'Bash command to search pattern (sequence) and print everything what's next to the pattern (to the right and left side)

I'm trying to reconstruct a gene sequence based on a PoolSeq file of a population (fasta format) and a conserved area. I want to search the file for matches with this sequence and then build up the neighboring area starting from that conserved sequence.

So I basically need a Bash command to search a fasta file for a sequence segment and to print the neighboring region of the match in every read.

File: Fasta file of dieverse Individuals of a species

Input: 20-30 bp Sequence

Output: All reads with that sequence and the neighboring region in that read



Solution 1:[1]

you can try with grep:

grep -o -E '.{,20}ATGCGT.{,20}' test.fasta description:

-o show only the part of the line that match

-E use extended regex

REGEX:

  • .{,20} up to 20 any char before
  • ATGCGT seq you want to match
  • .{,20} up to 20 any char after

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 cordigliere