'How to print the first four lines into a file, and the following four lines into a second file, and so on?

I have a fastq file with all my sequences stacked, which are a result of a paired-end sequencing. I need to separate them into two files, in a way that all reverse sequences are in one file and the forward in the second file. Because of that, I need to read the first four lines and write them on file "R" and read the next four lines and write them on file "F". After that I need to read and save the following lines in the same manner. I thought about something like this (below), but it did not work. Any help? please.

R = open("R.fastq","w+")
F = open("F.fastq","w+")

x = raw_input('type the name of the file you wanna split: ')   
with open (x, 'rt') as myfile:   
    for line in myfile:
        R.write (line)
        R.write (line)
        R.write (line)
        R.write (line)
        F.write (line)
        F.write (line)
        F.write (line)
        F.write (line)

R.close()
F.close()


Solution 1:[1]

This should do it:

r = [] # List for the lines to be written into R
f = [] # List for the lines to be written into F

with open('text.txt','r') as myfile: # Open the original file 
    lines = myfile.readlines() # and store each line inside a list called lines

index = 0 # Index of the line

while index <= len(lines)-1:

    for n in range(4):
        if index <= len(lines)-1:
            r.append(lines[index]) # Append line to r
            index+=1

    for n in range(4):
        if index <= len(lines)-1:
            f.append(lines[index]) # Append line to f
            index+=1


with open('file1.txt','w') as R:
    for line in r:
        R.write(line) # Write each line from r into R

with open('file2.txt','w') as F:
    for line in f:
        F.write(line) # Write each line from f into F

Solution 2:[2]

I think this will do what you want — at least it seemed to with a test file I created myself.

It uses a generator function I've named grouper() to split the lines in the input file into groups of 4, and then outputs them into one of the 2 output files. It determines which output file to use by counting the groups it's processing using the built-in enumerate() function and using the counter that produces modulo 2 (% 2) to select one or the other of them.

from itertools import zip_longest


def grouper(n, iterable):
    """ s -> (s0,s1,...sn-1), (sn,sn+1,...s2n-1), (s2n,s2n+1,...s3n-1), ... """
    FILLER = object()  # Value that couldn't be in data.
    for result in zip_longest(*[iter(iterable)]*n, fillvalue=FILLER):
        yield tuple(v for v in result if v is not FILLER)


input_filename = 'sequences.txt'
output_filename1 = 'R.fastq'
output_filename2 = 'F.fastq'

with open(input_filename) as inp, \
     open(output_filename1, 'w') as outp1, \
     open(output_filename2, 'w') as outp2:

    output_files = outp1, outp2
    for i, group in enumerate(grouper(4, inp)):
        outp = output_files[i % 2]
        for line in group:
            outp.write(line)

print('done')

Solution 3:[3]

Your problem was that you were writing the same line to both files four times, for every iteration through your loop, there was no way for the program to determine which line should be written to which file. Give this code a try, I can't test it without the files, but the theory of it should function.

Every line this will track which line it's at. If the line is a multiple of four, it will increment q, If q is even, it will write to file R, if q is odd, it will write to file F.

R = open("R.fastq","w+") # open file R with write permissions
F = open("F.fastq","w+") #open file q with write permissions

x = raw_input('type the name of the file you wanna split: ')   #input file name
p = 0 #variable to increment, tracking which line you're at
q = 0 #variable to track when to switch files
with open (x, 'rt') as myfile:   #open input file with read permissions
    for line in myfile: # loop through file
        if q%2 == 0: #if q is even
            R.write (line) #write to file R
        elif q%2 == 1: #if q is odd
            F.write (line) #write to file F
        p+=1 #increment tracker to next line
        if p%4 == 0: # if line is a multiple of 4
            q+=1 #increment q to switch files

R.close() #close file R
F.close() #close file F

Solution 4:[4]

This is called "de-interleaving" an interleaved FASTQ. If you Google that, you'll find any number of pre-made solutions, including the reformat command of the BBmap/BBtools package. http://seqanswers.com/forums/showthread.php?t=46174

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ann Zen
Solution 2
Solution 3
Solution 4 Tom Morris