'How to read a large log/txt file(in several GB's) in a way that first it takes N number of lines in memory and then it takes next N number of lines

I have tried this program which is reading my file by characters in chunks which is the behaviour I want

def read_in_chunks(file_object, chunk_size=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.read(chunk_size)
        if not data:
            break
        yield data


with open('really_big_file.dat') as f:
    for piece in read_in_chunks(f):
        print(piece)

But When I try to apply the same method using readlines() then it doesn't works for me. Here is the code I am trying..

def read_in_chunks(file_object, chunk_size=5):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.readlines()[0:chunk_size]
        if not data:
            break
        yield data


with open('Traefik.log') as f:
    for piece in read_in_chunks(f):
        print(piece)

Can Somebody help me how can I achieve the same chunks behaviour for N number of lines?

Solution 1:^[1]

By default .readlines() reads the whole content of the stream into a list. But you can give it a byte size to produce lines in chunks:

Read and return a list of lines from the stream. hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint.

So, you could adjust your function to something like:

def read_in_chunks(file_object, chunk_size_hint=1024):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while True:
        data = file_object.readlines(chunk_size_hint)
        if not data:
            break
        yield data

But that doesn't guarantee a fixed number of lines per chunk. If you look a bit further in the docs you'll find the following advice:

Note that it’s already possible to iterate on file objects using for line in file: ... without calling file.readlines().

That's a hint that something like this

def read_in_chunks(file_object, chunk_size=10):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 10 lines"""
    data = []
    for n, line in enumerate(file_object, start=1):
        data.append(line)
        if not n % chunk_size:
            yield data
            data = []
    if data:
        yield data

might be better suited.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'How to read a large log/txt file(in several GB's) in a way that first it takes N number of lines in memory and then it takes next N number of lines

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]