'Extrude Acc (Gene ID or accession number) from a fasta file

What does ".gb\\|(.*)\\|.*","\\1 in the function gsub mean?

enter image description here



Solution 1:[1]

If you have a single FASTA sequence in the file you can solve the problem by reading the first line of the file and then split it by the pipe character |.

If you have multiple sequences then you can read the first character for each line and look for the > character.

Here is a code example in Python. If you need another ID then you can change the index.

with open('AE004437.faa') as fh:
    header_line = fh.readline()
    ids = header_line.split('|')
    gene_ids = ids[3]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 liveware