'how can I make my binary file reader code faster?
I have a very very huge binary npy file generated by appending multiple arrays to an npy file. I can only hold one array at a time in memory. Loading the whole thing into memory is not an option. I wrote the below code that is relatively fast at getting the nth array out of the file because it jumps right to the right place, but I wonder if there is an easy way to make it faster. I noticed about half the runtime is just "array[i] = value", which was surprising given NumPy's speed. I was expecting the for loop itself to be eating most of the time, but it's not. Any advice?
from pathlib import Path
import numpy as np
from time import time
import struct
def loadArr(file, frame, N):
"""
loads a C-contiguous (row major) NxN array of doubles
from a binary npy file generated by appending arrays to the npy file
"""
p = Path(file)
unpack_double = struct.Struct("d")
with p.open("rb") as f:
f.seek(8) # 'NUMPY' gobbledygook
HEADER_LEN = f.read(2) # short integer
HEADER_LEN = struct.unpack("h", HEADER_LEN)[0]
bytesPerArray = (8 + 2 + HEADER_LEN + N * N * 8) * frame
f.seek(bytesPerArray + 8 + 2 + HEADER_LEN)
byte = f.read(8) # double precision
array = np.zeros(N * N)
for i in range(len(array)):
value = unpack_double.unpack(byte)[0]
array[i] = value
byte = f.read(8)
return np.reshape(array, (N, N)) # therefore must be C-contiguous
start = time()
file = "frog_hamiltonian_0.npy"
array = loadArr(file, 100, 262)
print(array)
print(time() - start)
exit()
# 0.02 seconds for 262x262 :-), 2.16 seconds for 2620x2620 :-/
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
