Category "text-mining"

Process and progress for natural language analysis of company communication?

Assume there is a large record of all different kinds of inter-employee and customer communications (e.g. mails, chat transcripts, OCRed letters) which should b

Finding most similar string (formed by two or more words) in a text in Python

Let's say I have the string st="red-winged cormorant" and the following text: text=""""I have in the past assisted teams at Milford Point and Lighthouse Point.

LDA Returning numbers instead of words from Term Document Matrix

I am trying to use the LDA function to evaluate a corpus of text in R. However, when I do so, it seems to use the row names of the observations rather than the

How do I find most frequent words by each observation in R?

I am very new to NLP. Please, don't judge me strictly. I have got a very big data-frame on customers' feedback, my goal is to analyze feedbacks. I tokenized wo

Proper settings for plot.estimateEffect in stm package

The stm package provides an indispensable set of tools for estimating the effect of covariates on topic prevalence. The plot.estimateEffect() function in partic

Problems with TermDocumentMatrix function in R

I'm trying to create a TermDocumentMatrix using tm package, but seem to have encountered difficulties. The input: trainDF<-as.matrix(list("I'm going home",

how can i classify the chapters of a pdf file and analyze the content per chapter?

I want to classify and analyze chapters and subchapters from a book in PDF format. So count the number of words and examine which word occurs how often and in w