'Convert FASTA to GenBank
Is there a way to use BioPython to convert FASTA files to a Genbank format? There are many answers on how to convert from Genbank to FASTA, but not the other way around.
Solution 1:[1]
before convert, you must asign alphabet to sequence (DNA or Protein)
from Bio import SeqIO
from Bio.Alphabet import generic_dna, generic_protein
input_handle = open("test.fasta", "rU")
output_handle = open("test.gb", "w")
sequences = list(SeqIO.parse(input_handle, "fasta"))
#asign generic_dna or generic_protein
for seq in sequences:
seq.seq.alphabet = generic_dna
count = SeqIO.write(sequences, output_handle, "genbank")
output_handle.close()
input_handle.close()
print "Coverted %i records" % count
for input:
>I28Q9A102FII8J rank=0668881 x=2144.0 y=1105.0 length=418 ACGTCATGAGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGATGAA GCTCCAGCTTGCTGGGGTGGATTAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCTTGACTCTGGGAT AAGCGTTGGAAACGACGTCTAATACCGGATATGACGACCGATGGCATCATCTGGTTGTGGAAAGAATTTTGGTC AAGGATGGACTCGCGGCCTATCAGGTAGTTGGTGAGGTAATGGCTCACCAAGCCTACGACGGGTAGCCGGCCTG AGAGGGTGACCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCA CAATGGGCGAAAGCCTGATGCAGCAACGCCGCGTGAGGGATGACGGCC >I28Q9A102JMH72 rank=0320459 x=3829.0 y=3120.0 length=512 ACGTCATGAGAGTTTGATCCTGGCTCAGGATGAACGCTAGCGGCAGGCTTAACACATGCAAGTCGAGGGTAGAA ATAGCTTGCTATTTTGAGACCGGCGCACGGGTGCGTAACGCGTATGCAATCTGCCTTTTACAGGGGAATAGCCC AGAGAAATTTGGATTAATGCCCCATAGCGCTGCAGGGCGGCATCGCCGAGCAGCTAAAGTCACAACGGTAAAGA TGAGCATGCGTCCCATTAGCTAGTTGGTAAGGTAACGGCTTACCAAGGCGATGATGGGTAGGGTCCTGAGAGGG AGATCCCCCACACTGGTACTGAGACACGGACCAGACTCCTACGGGAGGCAGCAGTGAGGAATATTGGTCAATGG GCGCAAGCCTGAACCAGCCATGCCGCGTGCAGGATGAAGGCCTTCGGGTTGTAAACTGCTTTTGACGGAACGAA AAAGCT
you get:
LOCUS I28Q9A102FII8J 418 bp DNA UNK 01-JAN-1980
DEFINITION I28Q9A102FII8J rank=0668881 x=2144.0 y=1105.0 length=418
ACCESSION I28Q9A102FII8J
VERSION I28Q9A102FII8J
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
ORIGIN
1 acgtcatgag agtttgatca tggctcagga cgaacgctgg cggcgtgctt aacacatgca
61 agtcgaacga tgaagctcca gcttgctggg gtggattagt ggcgaacggg tgagtaacac
121 gtgagtaacc tgcccttgac tctgggataa gcgttggaaa cgacgtctaa taccggatat
181 gacgaccgat ggcatcatct ggttgtggaa agaattttgg tcaaggatgg actcgcggcc
241 tatcaggtag ttggtgaggt aatggctcac caagcctacg acgggtagcc ggcctgagag
301 ggtgaccggc cacactggga ctgagacacg gcccagactc ctacgggagg cagcagtggg
361 gaatattgca caatgggcga aagcctgatg cagcaacgcc gcgtgaggga tgacggcc
//
LOCUS I28Q9A102JMH72 450 bp DNA UNK 01-JAN-1980
DEFINITION I28Q9A102JMH72 rank=0320459 x=3829.0 y=3120.0 length=512
ACCESSION I28Q9A102JMH72
VERSION I28Q9A102JMH72
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
ORIGIN
1 acgtcatgag agtttgatcc tggctcagga tgaacgctag cggcaggctt aacacatgca
61 agtcgagggt agaaatagct tgctattttg agaccggcgc acgggtgcgt aacgcgtatg
121 caatctgcct tttacagggg aatagcccag agaaatttgg attaatgccc catagcgctg
181 cagggcggca tcgccgagca gctaaagtca caacggtaaa gatgagcatg cgtcccatta
241 gctagttggt aaggtaacgg cttaccaagg cgatgatggg tagggtcctg agagggagat
301 cccccacact ggtactgaga cacggaccag actcctacgg gaggcagcag tgaggaatat
361 tggtcaatgg gcgcaagcct gaaccagcca tgccgcgtgc aggatgaagg ccttcgggtt
421 gtaaactgct tttgacggaa cgaaaaagct
//
Solution 2:[2]
It is possible to convert the fasta to gb format for unsubmitted sequences, which dont have accession numbers. Yet to be submitted to NCBI.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | sunnykevin |
