'Python how to read binary file by chunks and specify the beginning offset

I have the code:

def read_chunks(infile, chunk_size):
    while True:
        chunk = infile.read(chunk_size)
        if chunk:
            yield chunk
        else:
            return

This works when I need to read the file by chunks; however, sometimes I need to read the file two bytes at a time, but start reading at the next offset, not the next chunk. For example: 00 01 02 03 04, I would need to read "00 01", "01 02", "02 03", "03 04" for a chunk size of 2. The function currently reads it as "00 01", "02 03", "04". Is there a way to implement what I'm trying to do in the same function, or should this just be as a separate function? What would this look like? I still need the function to work as-is, so I'm wondering if there's a way to just implement what I'm trying to do, maybe as an argument. Not sure if it would be better to implement this in the current function or just do that in a separate function.



Solution 1:[1]

Using tell() and seek(n) you can navigate the file pointer wherever you want in file.

tell(): returns the current file position in a file stream
seek(n): sets the current file position to n in a file stream

def read_chunks(infile, chunk_size, offset=0):
    if( chunk_size + offset < 1 ):
        return
    while True:
        chunk = infile.read(chunk_size)
        if chunk:
            yield chunk
            if offset != 0:
              if not infile.read(1): # eof reached
                return
              infile.seek(infile.tell()+offset-1)
              # -1 to revert read(1)
        else:
            return


f = open("x", "rb") # read binary

for i in read_chunks(f,2,-1):
  print(i,end=" ")

Update:

  • Opening file is using "rb" from "read binary" instead of "r"
  • EOF check changed

Addition -> offset parameter:

  • 0 (default value): reads chunks one after another
  • positive n: skips n chunks between reads
  • negative n: n previously read bytes are included in start of next chunk

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1