'python: how to create an iterator/iterable backed by threaded operation results?
I'm trying to improve a performance issue by preloading data used by a batching iterator, and I'm stuck at the point of fitting this into the python idiomatic style.
Currently I'm using a sequence of iterators composed one on top of the other:
iterator = list(dirin.iterdir())
iterator = TranformIterator(iterator, lambda path: load_img(path)) # This takes most of the time
iterator = PreloadingIterator(iterator) # I want to use this iterator to preload some of the data
iterator = BatchingIterator(iterator, batchsize=100)
iterator = BatchingIteratorWithMeta(iterator)
for batch in iterator:
All of these are implemented with the exception of PreloadingIterator:
class PreloadingIterator:
inner: typing.Iterator
def __init__(self,
inner: [typing.Union[typing.Sized, typing.Iterable]]
):
self.inner = inner
self.total = len(inner)
self.index = 0
self.__cache__ = []
def __len__(self):
return len(self.inner)
def __iter__(self):
mem = psutil.virtual_memory()
for item in self.inner:
memconsumed = psutil.virtual_memory()
if self.should_preload(mem, memconsumed):
pass
#threading.Thread(target=
#peeker = more_itertools.peekable(self.inner)
#preload_item = peeker.peek()
yield item
self.index += 1
def should_preload(self, oldmem, newmem):
return True # TODO
What I'm trying to do is peek ahead at the next item in the iterator(preload_item = peeker.peek()) and use that to start a thread to start loading the next result from the iterator. However, I'm struggling to think how I can change the item in for item in self.inner: so it refers to the next item, not the one from the underlying iterator.
How can I iterate over an iterator in a way which allows me to source the item from a precached result if it is available?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
