'How do I read multiple Excel Files into R and then convert them into Fasta files (without concatenating)?

I have a folder with about 80 Excel files that contain the fastaname in one column and a gene sequence another column. All files have a different number of rows.

I am trying to automate the process of writing an Excel File into a Fasta by reading in all the files I have in one folder and using the command I have for each file. I do NOT want to have all the data written into a single Fasta file, but want for example the file "gene1.xlsx" as "gene1.fas", "gene2.xlsx" as "gene2.fas" and so on.

The code I am using to convert a single Excel file into a Fasta file is as follows:

library(readxl)
library(tibble)
X<-read_excel("*name of the file*.xlsx", col_names=FALSE)
D<-do.call(rbind, lapply (seq(nrow(X)), function(i) t(X[i,])))

write.table(D,file = "*name of the file*.fas", row.names=FALSE, col.names=FALSE, quote=FALSE)

I guess that I need a for-loop for that, but I am new to programming and everything I have tried just gave me a single empty Fasta files as the output.

The code I used for that is the following:

library(readxl)
library(tibble)

file.list<- list.files(pattern="*.xlsx")
df.list <- lapply(file.list, read_excel)

library(tibble)
for (i in file.list) {
  B <-do.call(rbind, lapply (seq(nrow(file.list)), function(i) t(file.list[i,])))
  write.table(B,file = "*.fas", row.names=FALSE, col.names=FALSE, quote=FALSE)
}

Is there a way to do this?

I appreciate any help! Clara



Solution 1:[1]

If your data looks like this:

A column of names and a column of sequences.

And the fasta format is:

>name1
sequence1
>name2
sequence2

And you have a folder containing 80 excel files in the same format.

You can convert one excel to fasta like this

excelfname <- "seqfile1.xlsx"
# I use explicit colun types to make sure that the names are not interpreted as numbers or dates.
seqlist <- read_excel(excelfname,
                      col_names = c("header", "sequence"),
                      col_types = c("text", "text" ))
fastalist <- paste0(">", seqlist$header, "\n", seqlist$sequence)
fastafname <- str_replace(excelfname, "xlsx", "fas")
writeLines(fastalist, con = file(fastafname))

Then for many excel files, we can wrap this code in lapply()

excelfnames <- list.files(pattern = "*.xlsx")
seqlists <- lapply(excelfnames, function(excelfname)
    read_excel(excelfname,
               col_names = c("header", "sequence"),
               col_types = c("text", "text" )))
fastalists <- lapply(seqlists,
                     function(record) paste0(">", record$header, "\n",
                                             record$sequence))
fastafnames <- str_replace(excelfnames, "xlsx", "fas")
names(fastalists) <- fastafnames
lapply(names(fastalists), function(fname) writeLines(fastalists[[fname]],
                                                   con = file(fname)))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Aaron