'Loop through every file with specific format in a directory using sys argv
I'd like to loop through every file in a directory given by the user and apply a specific transformation for every file that ends with ".fastq".
Basically this would be the pipeline:
- User puts the directory of where those files are (in command line)
- Script loops through every file that has the format ".fastq" and applies specific transformation
- Script saves new output in ".fasta" format
This is what I have (python and biopython):
import sys, os
from Bio import SeqIO
from Bio.SeqIO.QualityIO import FastqGeneralIterator
from pathlib import Path
path = Path(sys.argv[1])
print(path)
glob_path = path.glob('*')
for file_path in glob_path:
if file_path.endswith(".fastq"):
with open(glob_path, "rU") as input_fq:
with open("{}.fasta".format(file_path),"w") as output_fa:
for (title, sequence, quality) in FastqGeneralIterator(input_fq):
output_fa.write(">%s\n%s\n" \
% (title, sequence))
if not os.path.exists(path):
raise Exception("No file at %s." % path)
The script I have is running, but it is not producing the ouput (it is not creating the fasta file as desired). How could I make it so that the script loops through the files of a specific directory and passes the global path for each file onto the for loop so that the content of input_fq is read and a given transformation is saved onto the output_fa?
Solution 1:[1]
Your problem is with this line:
with open(glob_path, "rU") as input_fq:
Remember that glob_path is a list containing all of the files in the user-supplied directory. You want to open file_path, which represents each element of the list you are iterating over:
with open(file_path, "rU") as input_fq:
Also, to be more succinct, you can eliminate your first if statement by just globbing for the pattern "*.fastq":
glob_path = path.glob('*.fastq')
for file_path in glob_path:
with open(file_path, "rU") as input_fq:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
