'Bash command to search pattern (sequence) and print everything what's next to the pattern (to the right and left side)
I'm trying to reconstruct a gene sequence based on a PoolSeq file of a population (fasta format) and a conserved area. I want to search the file for matches with this sequence and then build up the neighboring area starting from that conserved sequence.
So I basically need a Bash command to search a fasta file for a sequence segment and to print the neighboring region of the match in every read.
File: Fasta file of dieverse Individuals of a species
Input: 20-30 bp Sequence
Output: All reads with that sequence and the neighboring region in that read
Solution 1:[1]
you can try with grep:
grep -o -E '.{,20}ATGCGT.{,20}' test.fasta
description:
-o show only the part of the line that match
-E use extended regex
REGEX:
.{,20}up to 20 any char before- ATGCGT seq you want to match
.{,20}up to 20 any char after
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | cordigliere |
