'Why can generator expressions be iterated over only once?
This program:
def pp(seq):
print(''.join(str(x) for x in seq))
print(''.join(str(x) for x in seq))
print('---')
pp([0,1,2,3])
pp(range(4)) # range in Python3, xrange in Python2
pp(x for x in [0,1,2,3])
prints this:
0123
0123
---
0123
0123
---
0123
---
It is a result of the difference between container.__iter__ and iterator.__iter__. Both are documented here: https://docs.python.org/3/library/stdtypes.html#typeiter. If the __iter__ returns a new iterator each time, we see two repeating lines. And if it returns the same iterator each time, we see just one line, beacuse the iterator is exhausted after that.
My question is why it was decided to implement generator expressions not similar to other objects that seem to be equivalent or at least very similar?
Here is another example that generator expressions are different than other similar types which may lead to unexpected outcome.
def pp(seq):
# accept only non-empty sequences
if seq:
print("data ok")
else:
print("required data missing")
pp([])
pp(range(0))
pp(x for x in [])
output:
required data missing
required data missing
data ok
Solution 1:[1]
A generator runs arbitrary code, and returns a lazy sequence with the items yielded by that code.
- That code could be providing an infinite sequence.
- That code could be reading contents off a network connection.
- That code could be modifying external variables every time it iterates.
Thus, you can't cache the results safely in the general case: If it's an infinite sequence, you'd run out of memory.
You also can't simply rerun the code when it's read from again: If it's reading off a network connection, then the connection may no longer be there again in the future (or may be in a different state). Similarly, if the code is modifying variables outside the genexp's scope, then that state would be changed in hard-to-predict ways based on the readers' behavior -- an undesirable propery for a language that values predictability and readability.
Some other languages (notably Clojure) do implement generators (there, generalized as "lazy sequences") that cache results if and only if a reference to the beginning of the sequence is retained. This gives the programmer control, but at a cost of complexity: You need to know exactly what is and is not referenced for the garbage collector.
The decision not to do this for Python is entirely reasonable and in keeping with the language's design goals.
Solution 2:[2]
My question is why it was decided to implement generator expressions not similar to other objects that seem to be equivalent or at least very similar?
Because that's exactly what a generator is. If you make it similar to other iterables then you have to preserve all of the items in memory and then what you have is no longer a generator.
The main benefit of using generators is that they don't consume as much memory. They simply generate the items on demand. This makes them one-shot iterables, because if you wanted to be able to iterate over them multiple times then you would have to preserve all of the items in memory.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Charles Duffy |
| Solution 2 | jarmod |
