generator, iterable and iterator are confused

generator, iterable and iterator are confused

Through the list generation formula, we can directly create a list. However, due to memory constraints, the list capacity must be limited. Moreover, creating a list containing 1 million elements not only takes up a lot of storage space, but if we only need to access the first few elements, the space occupied by most of the latter elements is wasted.

Therefore, if the list elements can be calculated according to some algorithm, can we continuously calculate the subsequent elements in the process of circulation? This saves a lot of space by eliminating the need to create a complete list. In Python, this mechanism of calculating while looping is called Generator.

The following describes two generator generation methods.

Expression form(x for x in range(10))
Function formdef gen(): yield 1

1, Simple generator (derived)

>>> L = [x for x in range(10)]
>>> L2 = (x for x in range(12))
>>> print(L)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> print(L2)
<generator object <genexpr> at 0x01C8A4C0>

The difference between generator object creation and list derivation is that generator derivation is created with parentheses.

Traversal mode

According to the definition of design patterns: the basis of reusable object-oriented software, iterators are used to extract elements from collections; The generator is used to generate elements "out of thin air". adopt Fibonacci sequence It can well explain the difference between the two: there are infinite numbers in the Fibonacci sequence, which can't be put in a set.

Only when you really need to calculate the value will you go down to calculate the value. It is a lazy evaluation. Traversal requires the use of next () or generator__ next__ () method.

>>> print(L2.__next__())
>>> print(next(L2))

You can also use the for loop to iterate, and the generator also belongs to the iteratable object. We'll delve into iteration objects later.

All generators are iterators because the generator fully implements the iterator interface. In the Python community, iterators and generators are considered the same concept most of the time.

for i in L2:

However, in most cases, the next() method will not be used, because when the last element is traversed, the next() method will pop up an exception StopIteration:, and we should try our best to avoid directly calling python's built-in special methods.

2, Function with yield statement

Python has no macros, so in order to abstract the iterator pattern, you need to change the language itself. To this end, python 2.2 (2001) added the yield keyword, which is used to build a generator, and its function is the same as that of an iterator.

def fib2(maxValue):
    n, a, b = 0, 0, 1
    while n < maxValue:
        yield b
        a, b = b, a + b
        n = n + 1
f = fib2(4)
<generator object fib2 at 0x7f15d41148d0>

If a function definition contains the yield keyword, the function is no longer an ordinary function, but a generator:

**Note: * * the execution order of the function with yield statement is inconsistent with that of the normal function. To be exact, the return type of the normal function is < class' nonetype >. The return type of a function with a yield statement is < class' generator >.

  • Normal function execution is sequential
  • The function with yield is executed when next() is called, and returns after executing the yield statement block. When it is executed again, it starts from the next sentence of the last returned yield statement.
  • Note: the function of the yield statement is to return the value, which is equivalent to the function of return. You can directly return the yield value through the next(generator)
def f():
    yield 1
    yield 3
>>> f = f() #Function internal statements are not executed at this time
>>> next(f)
>>> next(f)

Special characteristics

send(): pass in a value to the yield variable. It can be string or number type

    • Supplement the relevant knowledge of python data structure
def foo():
    while True:
        r = yield 2

f = foo()
print(f.send(None)) #The effect is consistent with that of print(next(f))
>>> starting
>>> 2
>>> 1
>>> 2

f.send(None): the effect is equivalent to next(f). At this time, the function outputs' starting ', and then executes yield 2, that is, it returns 2. So the result of print(f.send(None)) is

>>> starting
>>> 2

print(f.send(1)): pass 1 to yield, and yield assigns the value to r. Then print r. Finally, loop back to yield 2 and then 2

r = yield 2 is mainly divided into two steps:

Step 1: yield 2, that is, return 2 first

Step 2: r = (yield) the purpose of wrapping yield in parentheses here is to highlight that yield is an expression. Expression: can be used to represent a value.

3, Iterator

Before formally introducing the concepts of iteratable objects and iterators, I would like to explain the concept of "iteration". This is not a very profound concept. In fact, when you are familiar with a programming language, you often use the loop structure to complete some tasks. Most high-level languages have the definition of for loop. A slightly more professional word than "loop" is "traversal" or "iteration" * *. They all express that we need to use the loop structure to use an object from element to element.

3.1 word sequence 1st Edition

We need to implement a sense class to open and explore iteratable objects. Pass in a text sequence through the construction method of the class, and then iterate the text sequence through the for loop. This example shows why iteration is possible

class Sentence:
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)  # The re.findall function returns a list of strings whose elements are all non overlapping matches of regular expressions
        # self.words stores the result returned by the. findall function, so it directly returns the word at the specified index bit
    def __getitem__(self, index):
        return self.words[index]

    def __len__(self):
        return len(self.words)

    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text) # reprlib.repr is used to generate a short string representation of large data structures.
    # By default, reprlib.repr generates a string of up to 30 characters.
  • The re.findall function returns a list of strings whose elements are all non overlapping matches of regular expressions
  • self.words stores the result returned by the. findall function, so it directly returns the word at the specified index bit
  • In order to improve the sequence protocol, we implement__ len__ method; However, in order for the object to iterate, it is not necessary to implement this method.
  • Reprlib.repr is used to generate a short string representation of large data structures. By default, reprlib.repr generates a string of up to 30 characters.

Test whether the Sentence instance can iterate through the for loop.

>>> s = Sentence('"The time has come," the Walrus said,')
>>> for word in s:
...	    print(word)
print(s[0]) #At this point, the sense class can get words according to the index.

3.1.1 reason why sequences can be iterated: iter function

Python's built-in function iter has the following functions.

  • Check whether the internal of the class to which the object belongs implements__ iter__ Method, if implemented, calls it and returns an iterator.
  • If not__ iter__ Method, but it is implemented__ getitem__ Method, Python creates an iterator and tries to get the elements in order (starting with index 0).
  • If the attempt fails, Python throws a TypeError exception, which usually prompts "C object is not iteratable", where C is the class to which the target object belongs.

The reason any Python sequence can be iterated is that they all implement__ getitem__ method. In fact, the standard sequences are also implemented__ iter__ Method, so you should do the same. Why__ getitem__ Method is specially processed for backward compatibility, which may not be done in the future.

This is the extreme form of duck typing: not only to implement special__ iter__ Method, but also implement__ getitem__ Method, and__ getitem__ The parameter of the method is an integer (int) starting from 0, so that the object is considered to be iterative.

In goose typing theory, the definition of iteratable objects is simpler, but less flexible: if implemented__ iter__ Method, then the object is considered to be iterative. At this point, you do not need to create subclasses or register, because the abc.Iterable class implements__ subclasshook__ method.

>>>class Foo:
...		def __iter__(self):
...     	pass
>>>f = Foo()
>>>print(issubclass(Foo, abc.Iterable))
>>>print(issubclass(type(f), abc.Iterable))

Note: Although the sense class can be iterated, it cannot pass the issubclass (sense, ABC. Iterable) test.

3.1.2 iteratable objects

  • Implements the function that can return iterators__ iter__ method
  • Sequences can be iterated
  • Realized__ getitem__ Method, and its parameters are zero based indexes.

Starting with Python 3.4, the most accurate way to check whether object x can iterate is to call the ITER (x) function. If it cannot iterate, a TypeError exception will be thrown. This is more accurate than using issubclass(x,abc.Iterable), because the ITER (x) function considers legacy__ getitem__ Method, and the abc.Iterable class will not be considered.

<iterator object at 0x021B1F40>
>>>class Foo:
...    pass
TypeError: 'Foo' object is not iterable

3.1.3 iterators

Iterators are obtained from iteratable objects. For a string sequence, it is an iteratable object, which can be verified by issubclass and iter. At the same time, you can get the iterator of the string from the string object.

>>>from collections import abc
>>>s = "abc"
<str_iterator object at 0x016CF6E8>

Next, get its iterator object through iter.

>>>it = iter(s)
>>>while True:
...	    try:
...	        print(next(it))
...	    except StopIteration: # If there are no characters, the iterator throws a StopIteration exception
...	        del it #Release the reference to it, that is, discard the iterator object
...	        break

UML class diagrams of Iterable and Iterator in Python source code are as follows:

Iteratable and Iterator are abstract classes, and the abstract methods are shown in italics.

The standard iterator interface has two methods.

  • __ next__: Returns the next available element. If there is no element, a StopIteration exception is thrown
  • __ iter__: Return self, that is, the iterator object itself. To use iterators where iteratable objects should be used, such as in a for loop.

In Python 3.6_ collections_ The source code of the class related to Iterator in is as follows. Source code online reading

class Iterator(Iterable):

    __slots__ = ()

    def __next__(self):
        'Return the next item from the iterator. When exhausted, raise StopIteration'
        raise StopIteration

    def __iter__(self):
        return self

    def __subclasshook__(cls, C):
        if cls is Iterator:
            return _check_methods(C, '__iter__', '__next__')
        return NotImplemented

In Python 3.6, Lib/ The source code of the module has the following comments:

# Iterators in Python aren't a matter of type but of protocol.  A large
# and changing number of builtin types implement *some* flavor of
# iterator.  Don't check the type!  Use hasattr to check for both
# "__iter__" and "__next__" attributes instead.

The above comments are in the abc.Iterator abstract class__ subclassshook__ Function of method. Considering the suggestions in the source code, the best way to check whether object x is an iterator is to call isinstance(x,abc.Iterator). Thanks to * * iterator__ subclassshook__ Method, even if the class to which object x belongs is not a real or virtual subclass of iterator * *.

From the source code, the definition of iterator is as follows:

Iterators are objects such as:

  • Implement parameterless__ next__ Method to return the next element in the sequence; If there are no elements, a StopIteration exception is thrown.
  • Realized__ iter__ Method, so iterators can also iterate.

However, our sense class does not implement the above two methods. Why can we return the iterator of sense object through Python's built-in iter() method? In fact, the iter() method has been described above, because the Sentence class implements the__ getitem__ Method, which can also return iterators through Python's built-in iter() method.

3.2 word sequence 2nd Edition

The Sentence class in version 2 implements a typical iterator design pattern according to the model given in the book design patterns: the basis of reusable object-oriented software.

class SentenceIterator:
    def __init__(self,words):
        self.words = words
        self.index = 0
    def __next__(self):
            word = self.words[self.index]
        except IndexError:
            raise StopIteration()
        self.index += 1
        return word
    def __iter__(self):
        return self

class Sentence:
    def __init__(self,text):
        self.text = text
        self.words = RE_WORD.findall(text)
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)
    def __iter__(self):
        return SentenceIterator(self.words)

Compared with the first version, the Sentence class in the second version is deleted__ getitem__, Added__ iter__ method. To better illustrate the characteristics of iteratable objects. Sentinceiterator is the iterator object returned by sentince, which mainly deals with the internal state of the iterator. You can use the for loop to verify whether Sentence is iteratable.

>>>s = Sentence('hello world')
>>>for i in s: #This line of code is executed__ iter__ (), and returned the sentinceiterator object
...    print(i)

Readers must feel that * * sentinceiterator is more than enough here. Why can't it be implemented directly in the sentince * * class__ next__ method. Let Sentence be both an iteratable object and its own iterator. But in reality, this idea is bad, which is a common anti pattern.

When explaining the iterator design pattern in the book design pattern: the basis of reusable object-oriented software, it is said in the "applicability" section (P172):

Iterator mode can be used to:

  • Access the contents of an aggregate object without exposing its internal representation
  • Supports multiple traversals of aggregate objects
  • Provide a unified interface for traversing different aggregation structures (polymorphic iteration)

In order to "support multiple traversals", it must be possible to obtain multiple independent iterators from the same iterative instance, and each iterator must be able to maintain its own internal state. So you have to pass every time**__ iter__ Create a sentinceiterator * * iterator instance.

The above is the definition in the book. I'll explain it in easy to understand language. In a py file, we may traverse more than one sense instance, but in most scenarios, we hope that the sense object traversed each time is a complete and new object. Instead of continuing to traverse where it was last traversed. Look at a correct example.

>>>s = Sentence('hello world i am lgr')
>>>for i in s: 
...    print(i)
...    break
>>>for i in s:
...    print(i)

You can see that the second traversal of the object is to restart the traversal, not from the end of the last traversal. If Sentence does the iterator itself, it will start from the end of the last traversal.


An iteratable object must not be its own iterator. That is, the iteration object must be implemented**__ iter__ Method, but it cannot be implemented__ next__ method. On the other hand, iterators should always be able to iterate. Iterator__ iter__** Method should return itself.

Reference articles

Python Generator and yield usage details

Iteratable objects, iterators, and generators

Summary of generator s in python

Usage of Python 3 iter function

Fluent Python

Keywords: Python generator iterator

Added by killfall on Fri, 26 Nov 2021 15:58:55 +0200