'how can I make my binary file reader code faster?

I have a very very huge binary npy file generated by appending multiple arrays to an npy file. I can only hold one array at a time in memory. Loading the whole thing into memory is not an option. I wrote the below code that is relatively fast at getting the nth array out of the file because it jumps right to the right place, but I wonder if there is an easy way to make it faster. I noticed about half the runtime is just "array[i] = value", which was surprising given NumPy's speed. I was expecting the for loop itself to be eating most of the time, but it's not. Any advice?

from pathlib import Path
import numpy as np
from time import time
import struct


def loadArr(file, frame, N):
    """
    loads a C-contiguous (row major) NxN array of doubles
    from a binary npy file generated by appending arrays to the npy file
    """
    p = Path(file)
    unpack_double = struct.Struct("d")
    with p.open("rb") as f:
        f.seek(8)  # 'NUMPY' gobbledygook
        HEADER_LEN = f.read(2)  # short integer
        HEADER_LEN = struct.unpack("h", HEADER_LEN)[0]
        bytesPerArray = (8 + 2 + HEADER_LEN + N * N * 8) * frame
        f.seek(bytesPerArray + 8 + 2 + HEADER_LEN)
        byte = f.read(8)  # double precision
        array = np.zeros(N * N)
        for i in range(len(array)):
            value = unpack_double.unpack(byte)[0]
            array[i] = value
            byte = f.read(8)

        return np.reshape(array, (N, N))  # therefore must be C-contiguous


start = time()
file = "frog_hamiltonian_0.npy"
array = loadArr(file, 100, 262)
print(array)
print(time() - start)
exit()
# 0.02 seconds for 262x262 :-), 2.16 seconds for 2620x2620 :-/

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'how can I make my binary file reader code faster?

Sources

Related Questions