'Merging 2 binary files and read them

I have two files that I want to merge in the same file, and afterwards read them. One is an image with almost all values equals, and hence is only 400 bytes size, and the second one is a map of probabilities that is between 50 and 100kb.

My idea is to open them in binary mode and append them. However, later I want to be able to read them. How can I read this structure? Should I merge the files like: size of small+small+large?

To write them I was thinking on doing:

# Small = input1, large = input2
input1 = open('input1.bin', 'rb').read()
input2 = open('input2.bin', 'rb').read()

input1 += input2 

with open('Output.bin', 'wb') as fp:
    fp.write(input1)


Solution 1:[1]

It might be useful "struct" library for someone. Here is my solution:

# Saving file jointly
input1 = open('file1', 'rb').read()
input2 = open('file2', 'rb').read()
filesize = len(input1).to_bytes(4, 'big')
output = filesize + input1 + input2

with open('Output.bin', 'wb') as fp:
    fp.write(output)

# Open them
input1 = open('Output.bin', 'rb').read()
filesize2 = int.from_bytes(input1[:4], "big")

file1 = input1[4:4+filesize2]
file2 = input1[4+filesize2:]

UPDATE:

Although this approach works well, if you don't mind to waste a few kb per combined file you can also try pickle. Joining a 400 bytes image to a 57kb npz file I have noticed that the combined file is 57kb with the previous routine, but 78kb with pickle. However, pickle beats in speed as it reduces by half the speed of binary.

Finally, in my case my npz file had a structure that I later had to read,

npz = npz['mystructure']

This step is very slow, and by preparing it with pickle previously, the difference is orders of magnitude...

import numpy as np 
from PIL import Image
import time
from io import BytesIO
import pickle

filename = "MYFILENAME"

# Saving file jointly BYTES
image = open(f'{filename}.png', 'rb').read()
npz = open(f'{filename}.npz', 'rb').read()
filesize = len(image).to_bytes(4, 'big')
output = filesize + image + npz

with open('Output.bin', 'wb') as fp:
    fp.write(output)

# Saving file jointly PICKLE
image = open(f'{filename}.png', 'rb').read()
image = Image.open(f"{filename}.png") 
npz = np.load(f'{filename}.npz')
npz = npz['mystructure']

with open('filename.pickle', 'wb') as handle:
    pickle.dump([image, npz], handle, protocol=pickle.HIGHEST_PROTOCOL)

# READING TEST
now = time.time()
for i in range(10000):
    # Open them
    input1 = open('Output.bin', 'rb').read()
    filesize2 = int.from_bytes(input1[:4], "big")
    
    image = Image.open(BytesIO(input1[4:4+filesize2])) 
    npz = np.load(BytesIO(input1[4+filesize2:]))
    #npz = npz['mystructure']

print(f"BINARY lasts {time.time()-now} seconds")

now = time.time()
for i in range(10000):
    # Open them
    with open('filename.pickle', 'rb') as handle:
        b = pickle.load(handle)
        image = b[0]
        npz = b[1]

print(f"PICKLE lasts {time.time()-now} seconds")

now = time.time()
for i in range(10000):
    npz = np.load(f"{filename}.npz", mmap_mode='r')
    image = Image.open(f"{filename}.png") 
    #npz = npz['mystructure']

print(f"2 FILES lasts {time.time()-now} seconds")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1