'how to find locations of the substring given by the recognition sequence in the genome

i want locations of the substring given by the recognition sequence in the genome. wrote a function which pass through the sequence and store the indices where we match the recognition sequence but its returning empty conatiner. the sequence looks like

ATCGGCGCGCGCGCGTATATATATATATATAGGCGCGCGCGCGCGTATATATATATATAGCGGCGCGCGCGCG

def restriction_sites(seq, recog_seq):
    """Find the indices of all restriction sites in a sequence."""
    #here variable "seq" contains fasta sequence and "recog_seq" contains recognition sequence
    # Initialize list of restriction sites
    sites = []

    # Check every substring for a match
    for i in range(len(seq) - len(recog_seq)):
        if seq[i:i+len(recog_seq)] == recog_seq:
            sites.append(i)

    return sites

when i run it like this:

print('HindIII:', restriction_sites(seq, 'AAGCTT'))

it shows me empty:

[]

i want output like this

HindIII: [23129, 25156, 27478, 36894, 37458, 37583, 44140]

Solution 1:^[1]

Your original function works. It returned you a blank list because your example seq has not 'AAGCTT'. For instance, it returned [15, 17, 19, 21, 23, 25, 27, 45, 47, 49, 51, 53, 55] to me for "TATA".

I could recommend use re for matching sites. That looks more explicit:

import re
from typing import *

def restriction_sites(seq: str, recog_seq: str) -> List[int]:
    return [match.start() for match in re.finditer(fr"(?={recog_seq})", seq)]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'how to find locations of the substring given by the recognition sequence in the genome

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]