'How to convert TDM csv file into Term Docment Matrix by tm package in R?
I have term document matrix in csv file. For example,
, doc1, doc2, doc3, doc4, doc5
main , 2, 0, 3, 0, 0
virtual, 4, 0, 0, 0, 1
origin , 0, 0, 1, 2, 0
....
How can I convert this to term document matrix in tm package?
I think the term document matrix by TermDocumentMatrix() function can be created from the sentences (list of words) in documents.
But I already have term document matrix and I would like to import and use it in tm package.
Please let me know the method.
Solution 1:[1]
Here's one approach (but there's likely a direct way within the tm package):
x <- read.csv(text=" , doc1, doc2, doc3, doc4, doc5
main , 2, 0, 3, 0, 0
virtual, 4, 0, 0, 0, 1
origin , 0, 0, 1, 2, 0", header=TRUE)
library(qdap)
dat <- x[, -1]
row.names(dat) <- x[, 1]
your_tdm <- tdm(as.wfm(dat))
tm::inspect(your_tdm)
## > tm::inspect(your_tdm)
## A term-document matrix (3 terms, 5 documents)
##
## Non-/sparse entries: 6/9
## Sparsity : 60%
## Maximal term length: 7
## Weighting : term frequency (tf)
##
## Docs
## Terms doc1 doc2 doc3 doc4 doc5
## main 2 0 3 0 0
## origin 0 0 1 2 0
## virtual 4 0 0 0 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tyler Rinker |
