'How can I append multiple texts to one dataframe (tibble) within a for loop using append function in R?

I have multiple *.txt files that contain the title and texts that I want to process in R. A program below reads all the *.txt and displays the final file while skipping the first read texts. My program is as here below. It uses for loop and I want to see all the texts

library(here)
library(glue)
library(tm)
library(SnowballC)
library(tidyverse)
library(tidytext)

all_texts <- list.files(setwd('.KCI/'), (startsWith = 'abstract'))
for(i in seq(1:length(all_texts)))
{
    data <- read_tsv(all_texts[i], , show_col_types = FALSE)
    corpus <- Corpus(VectorSource(data[i]))
    corpus[i] <- tm_map(corpus[i], tolower)
    corpus[i] <- tm_map(corpus[i], removePunctuation)
    corpus[i] <- tm_map(corpus[i], removeNumbers)
    corpus[i] <- tm_map(corpus[i], stripWhitespace)
    corpus[i] <- tm_map(corpus[i], removeWords, c(stopwords("english"), mystopwords))
    corpus[i] <- tm_map(corpus[i], stemDocument)
    dtm <- DocumentTermMatrix(corpus[i])
  }

This program just reads the final document but skips the previous ones. Therefore I want even other documents to be displayed before the last one.

<Title>       <Year>        <Text>
How is it?     1998          I am wondering if it could end like that. Therefore the deal is too good to be true


Solution 1:[1]

This would be a lot easier if you had provided some data.

library(tm)
library(SnowballC)
##
#   two documents based on your example (t1 & t2 are identical here).
#
t1 <- read.delim(text='
                 Title\tYear\tText
                 How is it?\t1998\tI am wondering if it could end like that. Therefore the deal is too good to be true',
                 header=TRUE)
t2 <- read.delim(text='
                 Title\tYear\tText
                 How is it?\t1998\tI am wondering if it could end like that. Therefore the deal is too good to be true',
                 header=TRUE)
data <- list(t1,t2)   # listof documents
dtm.list <- lapply(data, function(x) {
  corpus <- Corpus(VectorSource(x))
  corpus <- tm_map(corpus, tolower)
  corpus <- tm_map(corpus, removePunctuation)
  corpus <- tm_map(corpus, removeNumbers)
  corpus <- tm_map(corpus, stripWhitespace)
  corpus <- tm_map(corpus, removeWords, c(stopwords("english")))
  corpus <- tm_map(corpus, stemDocument)
  DocumentTermMatrix(corpus)
})
lapply(dtm.list, inspect)

Note I left out mystopwords because you did not provide any. In your case you could put the read_tsv(...) back into the function and use lapply(...) in the list of file names. Something like:

dtm.list <- lapply(all.texts, function(x) {
  data   <- read_tsv(x)
  corpus <- Corpus(VectorSource(data))
  ...
})

Where ... are the lines of code in my example above.

If your ultimate goal is to analyze word frequency, you might be better off using ?termFreq.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jlhoward