'How to convert TDM csv file into Term Docment Matrix by tm package in R?

I have term document matrix in csv file. For example,

       , doc1, doc2, doc3, doc4, doc5
main   ,    2,    0,    3,    0,    0

virtual,    4,    0,    0,    0,    1

origin ,    0,    0,    1,    2,    0

....

How can I convert this to term document matrix in tm package?

I think the term document matrix by TermDocumentMatrix() function can be created from the sentences (list of words) in documents.

But I already have term document matrix and I would like to import and use it in tm package.

Please let me know the method.



Solution 1:[1]

Here's one approach (but there's likely a direct way within the tm package):

x <- read.csv(text="   , doc1, doc2, doc3, doc4, doc5
main , 2, 0, 3, 0, 0

virtual, 4, 0, 0, 0, 1

origin , 0, 0, 1, 2, 0", header=TRUE)


library(qdap)
dat <- x[, -1]
row.names(dat) <- x[, 1]
your_tdm <- tdm(as.wfm(dat))

tm::inspect(your_tdm)

## > tm::inspect(your_tdm)
## A term-document matrix (3 terms, 5 documents)
## 
## Non-/sparse entries: 6/9
## Sparsity           : 60%
## Maximal term length: 7 
## Weighting          : term frequency (tf)
## 
##          Docs
## Terms     doc1 doc2 doc3 doc4 doc5
##   main       2    0    3    0    0
##   origin     0    0    1    2    0
##   virtual    4    0    0    0    1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tyler Rinker