'How do I read multiple Excel Files into R and then convert them into Fasta files (without concatenating)?
I have a folder with about 80 Excel files that contain the fastaname in one column and a gene sequence another column. All files have a different number of rows.
I am trying to automate the process of writing an Excel File into a Fasta by reading in all the files I have in one folder and using the command I have for each file. I do NOT want to have all the data written into a single Fasta file, but want for example the file "gene1.xlsx" as "gene1.fas", "gene2.xlsx" as "gene2.fas" and so on.
The code I am using to convert a single Excel file into a Fasta file is as follows:
library(readxl)
library(tibble)
X<-read_excel("*name of the file*.xlsx", col_names=FALSE)
D<-do.call(rbind, lapply (seq(nrow(X)), function(i) t(X[i,])))
write.table(D,file = "*name of the file*.fas", row.names=FALSE, col.names=FALSE, quote=FALSE)
I guess that I need a for-loop for that, but I am new to programming and everything I have tried just gave me a single empty Fasta files as the output.
The code I used for that is the following:
library(readxl)
library(tibble)
file.list<- list.files(pattern="*.xlsx")
df.list <- lapply(file.list, read_excel)
library(tibble)
for (i in file.list) {
B <-do.call(rbind, lapply (seq(nrow(file.list)), function(i) t(file.list[i,])))
write.table(B,file = "*.fas", row.names=FALSE, col.names=FALSE, quote=FALSE)
}
Is there a way to do this?
I appreciate any help! Clara
Solution 1:[1]
If your data looks like this:

A column of names and a column of sequences.
And the fasta format is:
>name1
sequence1
>name2
sequence2
And you have a folder containing 80 excel files in the same format.
You can convert one excel to fasta like this
excelfname <- "seqfile1.xlsx"
# I use explicit colun types to make sure that the names are not interpreted as numbers or dates.
seqlist <- read_excel(excelfname,
col_names = c("header", "sequence"),
col_types = c("text", "text" ))
fastalist <- paste0(">", seqlist$header, "\n", seqlist$sequence)
fastafname <- str_replace(excelfname, "xlsx", "fas")
writeLines(fastalist, con = file(fastafname))
Then for many excel files, we can wrap this code in lapply()
excelfnames <- list.files(pattern = "*.xlsx")
seqlists <- lapply(excelfnames, function(excelfname)
read_excel(excelfname,
col_names = c("header", "sequence"),
col_types = c("text", "text" )))
fastalists <- lapply(seqlists,
function(record) paste0(">", record$header, "\n",
record$sequence))
fastafnames <- str_replace(excelfnames, "xlsx", "fas")
names(fastalists) <- fastafnames
lapply(names(fastalists), function(fname) writeLines(fastalists[[fname]],
con = file(fname)))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Aaron |
