'Split .loci file into separate fasta files for desired loci
I have a large .loci file that contains 1 column with multiple rows of sequence data for multiple loci. There are n rows for the first locus with each row containing the name of the individual, a space, then the sequence of nucleotides for that individual locus. Each locus is separated from the next by a row starting with "//" and then a string of spaces, "*"s and "-"s with the locus number in [] at the end of the string. The rows separating loci are not consistent and can have any combination of these characters.
Example:
Ind_1 ACTGACTGACTGACTGACTG
Ind_2 ACTGACTGACTGACTGACTG
// * - * [1]
Ind_1 ACTGACTGACTGACTGACTG
Ind_3 ACTGACTGACTGACTGACTG
Ind_6 ACTGACTGACTGACTGACTG
// - * - [2]
Ind_2 ACTGACTGACTGACTGACTG
Ind_4 ACTGACTGACTGACTGACTG
// * - - [3]
I would like to extract individual .fasta files from this dataset for a vector of desired loci. For example, for locus [2] and locus [3].
How can I do this in R?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
