'Error in asMethod(object): Cholmod error 'problem too large'
I have the following object
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
..@ i : int [1:120671481] 0 2 3 6 10 13 21 22 25 36 ...
..@ p : int [1:51366] 0 3024 4536 8694 3302271 3302649 5715381 5756541 5784009 5801691 ...
..@ Dim : int [1:2] 10314738 51365
..@ Dimnames:List of 2
.. ..$ : chr [1:10314738] "line1" "line2" "line3" "line4" ...
.. ..$ : chr [1:51365] "sparito" "davide," "15enne" "di" ...
.. .. ..- attr(*, ".match.hash")=Class 'match.hash' <externalptr>
..@ x : num [1:120671481] 1 1 1 1 1 1 1 1 1 1 ...
..@ factors : list()
This object comes from the function dtm_builder of text2map package. Since I would like to remove empty rows from the matrix, I thought about using the command:
raw.sum=apply(dtm,1,FUN=sum) #sum by raw each raw of the table
dtm2=dtm[raw.sum!=0,]
Anyway, I obtained the following error:
Error in asMethod(object): Cholmod error 'problem too large' at file ..
How could I fix it?
Solution 1:[1]
The short answer to your problem is that you're likely converting a sparse object to a dense object. Matrix package sparse matrix classes are very memory efficient when a matrix has a lot of zeros (like a DTM) by simply not allocating memory for the zeros.
@akrun's answer should work, but there is a rowSums function in base R and a rowSums function from the Matrix package. You would need to load the Matrix package first.
Here is an example dgCMatrix (note not loading Matrix package yet)
m1 <- Matrix::Matrix(1:9, 3, 3, sparse = TRUE)
m1[1, 1:3] <- 0
class(m1)
If we use the base R rowSums you get the error:
rowSums(m1)
Error in rowSums(dtm): 'x' must be an array of at least two dimensions
If the Matrix package is loaded,rowSums will be replaced with the Matrix package's own method, which works with dgCMatrix. This is also true for the bracket operators [. If you update text2map to version 0.1.5, Matrix is loaded by default.
That is a massive DTM, so you may still run into memory issues -- which will depend on your machine. One thing to note is that removing sparse rows/columns will not help much. So, although words that occur once or twice will make up about 60% of your columns, you will reduce the size in terms of memory more by removing the most frequent words (i.e. words with a number in every row).
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Dustin Stoltz |
