'How can I make my python function more specific and dynamic

I'm fairly new with Python and having some trouble figuring out functions.

I have managed to create a function which separates a Fastq file into a DNA sequence, its quality score, and the sequence identifier, which seems to work.

The way the code is written now will always print a message with the output for DNA, quality, and identity. That's on purpose.

However, how can I make my code more 'dynamic'? Let's say... today I only wish to print DNA output with an accompanying message. How can I modify and call my function to be able to do that? Furthermore, it seems I can only call the first seqment of the Fastq file. How would you call other seqments or multiple seqments?

Thank you for your time.

def FastqFile(path_to_file):

    with open("example.fastq", 'r') as input:
        texts = input.read()
        blocks = texts.split("\n@")
#     print(texts)
#     print(blocks)
    dic = {}
    for sequence in blocks[:2]:
        sequence = sequence.replace("@","")
        sub_blocks = sequence.split("\n+\n")
        identifier = sub_blocks[0].split("\n")[0]
        quality = sub_blocks[-1]
        DNA = sub_blocks[0].split("\n")[-1]

        dic[identifier] =[]
        dic[identifier].append(quality)
        dic[identifier].append(DNA)
        print("Here is sequence: %s \n \nAnd quality: %s \n \nAnd identifier: \n%s" %(DNA, quality, identifier))
        return

FastqFile(input)

What I tried and what I was expecting? I have spent a few days trying to make this code work. I succeeded, but now my brain is fried so I decided to ask for help so I can learn and understand.



Solution 1:[1]

  1. Don't hardcode the file path into your function. You're taking a path_to_file parameter, so use it.
  2. You probably don't want to return from inside a loop where you're trying to generate multiple results. That just terminates the loop, which is why you're only getting one result. In this code it looks like you're adding all your data to a dictionary called dic, so you'd want to just let the entire for loop run its course and then do something with dic at the end.
  3. In general, your code is easier to reuse if you divide tasks like reading data from tasks like displaying data. Have one function build the dictionary from the file and return the whole dictionary; then whatever calls the function can use whatever pieces of it it wants.
def FastqFile(path_to_file):
    with open(path_to_file) as f:
        blocks = f.read().split("\n@")
    dic = {}
    for sequence in blocks[:2]:
        sequence = sequence.replace("@","")
        sub_blocks = sequence.split("\n+\n")
        identifier = sub_blocks[0].split("\n")[0]
        quality = sub_blocks[-1]
        DNA = sub_blocks[0].split("\n")[-1]

        dic[identifier] = [quality, DNA]
    return dic


example = FastqFile("example.fastq")
for identifier, [quality, DNA] in example.items():
    print(f"Here is sequence: {DNA}")
    print(f"With quality: {quality}")
    print(f"And identifier: {identifier}")

Note that in the above code you can easily change what you print out, or what file you read from, without having to change anything inside FastqFile. This in a nutshell is the point of writing a function -- to give you a tool that you can use in lots of different ways without having to rewrite it each time.

Taken a level further, when you're working on code with other people, someone else could call your FastqFile function and use the dictionary it returns for whatever they want without even having to understand exactly how the function works. That's the real power of high level programming! (If calling a function without looking at how it works sounds crazy to you, consider all the built-in functions that you're calling in this code -- you probably have very little idea what goes on inside open()!)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1