'np.matmul with large integer matrix without conversion/copy

import numpy as np

v = np.zeros((3,10000), dtype=np.float32)
mat = np.zeros((10000,10000000), dtype=np.int8)

w = np.matmul(v, mat)

yields

Traceback (most recent call last):
  File "int_mul_test.py", line 6, in <module>
    w = np.matmul(v, mat)
numpy.core._exceptions.MemoryError: Unable to allocate 373. GiB
for an array with shape (10000, 10000000) and data type float32

Apparently, numpy is trying to convert my 10k x 10m int8 matrix to dtype float32. Why does it need to do this? It seems extremely wasteful, and if matrix multiplication must work with float numbers in memory, it could convert say 1m columns at a time (which shouldn't sacrifice speed too much), instead of converting all 10m columns all at once.

My current solution is to use a loop to break the matrix into 10 pieces and reduce temporary memory allocation to 1/10 of the 373 GiB:

w = np.empty((v.shape[0],mat.shape[1]),dtype=np.float32)
start = 0
block = 1000000
for i in range(mat.shape[1]//block):
    end = start + block
    w[:,start:end] = np.matmul(v, mat[:,start:end])
    start = end
w[:,start:] = np.matmul(v, mat[:,start:])
# runs in 396 seconds

Is there a numpy-idiomatic way to multiply "piece by piece" without manually coding a loop?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source