'How to read a large log/txt file(in several GB's) in a way that first it takes N number of lines in memory and then it takes next N number of lines
I have tried this program which is reading my file by characters in chunks which is the behaviour I want
def read_in_chunks(file_object, chunk_size=1024):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
with open('really_big_file.dat') as f:
for piece in read_in_chunks(f):
print(piece)
But When I try to apply the same method using readlines() then it doesn't works for me. Here is the code I am trying..
def read_in_chunks(file_object, chunk_size=5):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
while True:
data = file_object.readlines()[0:chunk_size]
if not data:
break
yield data
with open('Traefik.log') as f:
for piece in read_in_chunks(f):
print(piece)
Can Somebody help me how can I achieve the same chunks behaviour for N number of lines?
Solution 1:[1]
By default .readlines() reads the whole content of the stream into a list. But you can give it a byte size to produce lines in chunks:
Read and return a list of lines from the stream. hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint.
So, you could adjust your function to something like:
def read_in_chunks(file_object, chunk_size_hint=1024):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k."""
while True:
data = file_object.readlines(chunk_size_hint)
if not data:
break
yield data
But that doesn't guarantee a fixed number of lines per chunk. If you look a bit further in the docs you'll find the following advice:
Note that it’s already possible to iterate on file objects using
for line in file: ...without callingfile.readlines().
That's a hint that something like this
def read_in_chunks(file_object, chunk_size=10):
"""Lazy function (generator) to read a file piece by piece.
Default chunk size: 10 lines"""
data = []
for n, line in enumerate(file_object, start=1):
data.append(line)
if not n % chunk_size:
yield data
data = []
if data:
yield data
might be better suited.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
