'Scipy: Sparse matrix multiplication memory error
I want to perform matrix multiplication between a sparse matrix and its transpose, (their are big matrices). Specifically, I have:
C = csc_matrix(...)
Ct = csc_matrix.transpose(C)
L = Ct*C
and shapes:
C.shape
(1791489, 28508141)
Ct.shape
(28508141, 1791489)
And I am getting the following error:
Traceback (most recent call last):
File "C:\...\modularity.py", line 373, in <module>
L = Ct*C
File "C:\...\anaconda3\lib\site-packages\scipy\sparse\base.py", line 480, in __mul__
return self._mul_sparse_matrix(other)
File "C:\...\anaconda3\lib\site-packages\scipy\sparse\compressed.py", line 518, in _mul_sparse_matrix
indices = np.empty(nnz, dtype=idx_dtype)
MemoryError: Unable to allocate 1.11 TiB for an array with shape (152087117507,) and data type int64
I cannot figure out why, why does it try to allocate memory for such a huge array ?
Update: Currently I am trying to do the multiplication in chunks like this
chunksize=1000
numiter = Ct.shape[0]//chunksize
blocks=[]
for i in range(numiter):
A = Ct[i*chunksize:(i+1)*chunksize].dot(C)
blocks.append(A)
But I get:
MemoryError: Unable to allocate 217. MiB for an array with shape (57012620,) and data type int32
Solution 1:[1]
For future viewers who want to multiply huge sparse matrices I solved my problem using PyTables and saved the result of the multiplication in chunks. Still it creates a big file but at least is compressed. The code I used goes like this:
import tables as tb
f = tb.open_file('D:\dot.h5', 'w')
l, m, n = Ct.shape[0], Ct.shape[1], C.shape[1]
filters = tb.Filters(complevel=8, complib='blosc')
out_data = f.create_earray(f.root, 'data', tb.Int32Atom(), shape=(0,), filters=filters)
out_indices = f.create_earray(f.root, 'indices', tb.Int32Atom(),shape=(0,), filters=filters)
out_indptr = f.create_earray(f.root, 'indptr', tb.Int32Atom(), shape=(0,), filters=filters)
out_indptr.append(np.array([0])) #this is needed as a first indptr
max_indptr = 0
#buffersize
bl = 10000
for i in range(0, l, bl):
res = Ct[i:min(i+bl, l),:].dot(C)
out_data.append(res.data)
indices = res.indices
indptr = res.indptr
out_indices.append(indices)
out_indptr.append(max_indptr+indptr[1:])
max_indptr += indices.shape[0]
So if for example you want access to the 2nd row of your final matrix you simply can:
L2 = csr_matrix((a.data[a.indptr[2]:a.indptr[2+1]], a.indices[a.indptr[2]:a.indptr[2+1]], np.array([0,len(a.indices[a.indptr[2]:a.indptr[2+1]])])), shape=(1,n))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | George |
