Deep understanding of Python iterators

iterator

Buy at www.fenfaw.com cn

What is iteration

Iteration refers to a repeated process. Each repetition must continue based on the last result. Simple repetition is not an iteration. For example, the for loop in Python is a very good example of iteration.

for item in range(10):
    print(item)

The iteration must move forward, not backward, as shown below:

# [0 , 1, 2, 3, 4, 5, 6, 7, 8, 9]
# ------------------------------>

The following method is not an iteration:

# [0 , 1, 2, 3, 4, 5, 6, 7, 8, 9]
# -------->
#     <----
#          --------------------->

iterator protocol

In the whole knowledge of learning iterators, iterator protocol occupies a very important position.

Iterator protocol contains two basic concepts: iteratable object and iterator object.

  • Iteratable object: internally implemented__ iter__ The object of () method is called iteratable object
  • Iterator object: an internal implementation__ next__ The object of () method is called iterator object

Relationship between the two:

  • In Python, iterator objects must belong to the category of iteratable objects, that is, iterator objects must have__ iter__ () method and__ next__ () method
  • In Python, iteratable objects do not necessarily belong to the category of iterator objects, that is, iteratable objects only need to be implemented__ iter__ () method is enough

Introduce 2 functions:

  • iter(Object) function, which will execute object__ iter__ () method
  • next(Object) function, which will execute object__ next__ () method

Built in type

Through collections Determine the iteratable class and Iterator class under ABC to quickly determine whether all built-in types are an iteratable object or Iterator object:

>>> from collections.abc import Iterable
>>> from collections.abc import Iterator

>>> isinstance(list(), Iterable)
True
>>> isinstance(list(), Iterator)
False

After testing, all container types (list, tuple, str, dict, set, frozenset) belong to iteratable objects, but not iterator objects

Atomic types (bool, int, float, None) are not iteratable objects, let alone iterator objects.

You can also verify in another way. Check whether a method is defined in the class through the hasattr() function:

>>> hasattr(list,"__iter__")
True
>>> hasattr(list,"__next__")
False

for loop principle

When the iteratable object is called by the for loop, the underlying execution process is as follows:

  1. The iter() method will be executed automatically, and the iteratable object will be found inside the method__ iter__ () method. If there is such a method, an exclusive iterator object of the iteratable object will be returned. If there is no such method, an exception of typeerror object is not iteratable will be thrown.

    Ps: each for loop returns a brand new iterator object

  2. Constantly calling iterator objects__ next__ () method and returns the next data item in the iterator object. After traversing the entire iterator, a Stopiteration exception is thrown to terminate the iteration

    Ps: the iterator itself does not store any data item, but only a pointer. The pointer points to the data item actually stored in the iteratable object. It points to the index position of the data item currently traversed, and the next traversal will push back to this position

  3. The for loop automatically catches the Stopiteration exception and stops the iteration

    PS: the bottom layer of the for loop is implemented by the while loop, but three additional steps are added:

    Step 1: execute the of the iteratable object__ iter()__ Method and save the returned exclusive iterator

    Step 2: execute the iterator continuously__ next()__ method

    Step 3: catch stopieration exception

We manually implement a for loop:

li1 = list(range(10))

iteratorObject = iter(li1)  # ❶
while 1:
    try:
        print(next(iteratorObject))  # ❷
    except StopIteration as e:  # ❸
        break

❶: execute the iteration of the iteratable object__ iter__ () method and save the returned exclusive iterator

❷: continuously execute the iterator__ next__ () method

❸: catch Stopiteration exception

Implementation of linear iteratable object and iterator

If it is an iteratable object of a linear container, it must have an index value. We can make it__ iter__ The () method returns an exclusive iterator object.

Then, the index value of this iteration is recorded in the exclusive iterator object, and the data items in the iteratable object are returned according to this index value. When the index value reaches the total number of data items in the iteratable object - 1, an exception is thrown, and this iteration ends:

class linearTypeContainer:
    def __init__(self, array):
        if isinstance(array, list) or isinstance(array, tuple):
            self.array = array
        else:
            raise TypeError("argument array must is linear container")

    def __iter__(self):
        return linearContainer_iterator(self.array)  # ❶


class linearContainer_iterator:
    def __init__(self, array):
        self.index = 0
        self.array = array
        self.len = len(self.array)

    def __next__(self):
        if self.index < self.len:
            retDataItem = self.array[self.index]
            self.index += 1
            return retDataItem
        else:
            raise StopIteration

    def __iter__(self):  # ❷
        return self


container = linearTypeContainer([i for i in range(10)])
for i in container:
    print(i)
print(list(container))

❶: all parameters passed in Python are passed by reference

So self. In linearTypeContainer Array and linearContainer_iterator's self Array is an object and does not open up additional memory space

This is why exclusive iterators created by iteratable objects do not consume too much memory space.

❷: the iterator object must belong to the category of iteratable objects, so here we call the iterator object linearcontainer_ The iterator class has also been added__ iter__ () method

The advantage of this is that if the iterator object is picked up separately, it will also support the traversal of the for loop:

    def __iter__(self):  # ❷
        return self


containerIterator = linearTypeContainer([i for i in range(10)]).__iter__()


for item in containerIterator:
    print(item)
    
# 0
# 1
# 2
# 3
# 4
# 5
# 6
# 7
# 8
# 9

If you cancel the linearcontainer_ This of the iterator class__ iter__ () method, traversal of the for loop is not supported:

...
    # def __iter__(self):  # ❷
    #     return self


containerIterator = linearTypeContainer([i for i in range(10)]).__iter__()


for item in containerIterator:
    print(item)

# TypeError: 'linearContainer_iterator' object is not iterable

Nonlinear iterative object and iterator implementation

If it is an iteratable object of a nonlinear container, you can judge its type first. If the incoming container is a dictionary, the iterated data item set is converted into tuples, and all stored in it are the key s of the dictionary.

If the incoming container is a collection, the iterated data item collection is converted into tuples, and then refer to the implementation of linear iteratable objects and iterators.

Specific implementation:

class mappingTypeContainer:
    def __init__(self, mapping):
        self.mapping = mapping
        self.mappingType = None

        if isinstance(mapping, dict):
            self.mappingType = "dict"

        elif isinstance(mapping, set) or isinstance(mapping, frozenset):
            self.mappingType = "set"

        else:
            raise TypeError("argument mapping must is mapping container")

    def keys(self):
        if self.mappingType == "set":
            raise TypeError("instance mapping type is set, no have method keys")
        else:
            return self.mapping

    def values(self):
        if self.mappingType == "set":
            raise TypeError("instance mapping type is set, no have method values")
        else:
            return self.mapping.values()

    def items(self):
        if self.mappingType == "set":
            raise TypeError("instance mapping type is set, no have method items")
        else:
            return self.mapping.items()

    def __str__(self):
        return str(self.mapping)

    def __iter__(self):
        return mappingContainer_iterator(tuple(self.mapping))


class mappingContainer_iterator:
    def __init__(self, array):
        self.index = 0
        self.array = array
        self.len = len(self.array)

    def __next__(self):
        if self.index < self.len:
            retDataItem = self.array[self.index]
            self.index += 1
            return retDataItem
        else:
            raise StopIteration

    def __iter__(self):
        return self


container = mappingTypeContainer({str("k") + str(i): str("v") + str(i) for i in range(3)})

for item in container.items():
    print(item)

print(container)

# ('k0', 'v0')
# ('k1', 'v1')
# ('k2', 'v2')
# {'k0': 'v0', 'k1': 'v1', 'k2': 'v2'}


container = mappingTypeContainer({i for i in range(3)})

for item in container:
    print(item)

print(container)

# 0
# 1
# 2
# {0, 1, 2}

Properties of iterator objects

The exclusive iterator of the iteratable object created by each for loop is one-time and will be useless after use:

# ❶
containerIterator = linearTypeContainer([i for i in range(3)]).__iter__()


for item in containerIterator:
    print(item)

# 0
# 1
# 2

for item in containerIterator:
    print(item)  # ❷
    print("?")

❶: take out an iterator object directly

❷: in the second cycle, the index value stored in the iterator object has reached the maximum. Every time iter() is called, an exception will be thrown, returned and processed by for. Therefore, the print() function will not run at all

The iterator object does not store the real iterative data in the iteratable object, but only the length and index, so it does not occupy much memory:

class linearContainer_iterator:
    def __init__(self, array):
        self.index = 0  # ❶
        self.array = array  # ❷
        self.len = len(self.array) # ❸
        
    ...

❶: occupy additional memory space

❷: reference object, does not open up memory

❸: occupy additional memory space

Lazy evaluation and early evaluation

The iterator object performs real-time calculation for the returned data items. This characteristic evaluation method of real-time calculation is called lazy evaluation, that is, I will calculate it and give it to you when you need it:

    def __next__(self):
        if self.index < self.len:
            retDataItem = self.array[self.index]
            self.index += 1
            return retDataItem
        else:
            raise StopIteration

In addition to lazy evaluation, there is an early evaluation scheme. Even if you want one, I'll give you all.

For example, range(), map(), filter(), dict.items(), dict.keys(), dict.values() in Python 2 all return a pure list, which is unreasonable.

Because the returned list will occupy a lot of memory space, and python 3 is uniformly optimized as an inert evaluation scheme, that is, to return an iterative object.

A deadly problem

① : why don't all built-in container types in Python set themselves as iterators?

Instead, an exclusive iterator is instantiated in the for loop?

It is directly implemented at the bottom of these built-in types__ next__ () is the method bad?

Isn't this less memory consumption, less classes defined and instantiated?

A: This is really a fatal question. I have thought about this question for a long time. Finally, I recorded it after asking a question in stack overflow and getting a good answer.

You can only add the following code as shown below:

class linearTypeContainer:
    def __init__(self, array):
        if isinstance(array, list) or isinstance(array, tuple):
            self.array = array
        else:
            raise TypeError("argument array must is linear container")

        self.index = 0
        self.len = len(self.array)

    def __iter__(self):
        return self

    def __next__(self):
        if self.index < self.len:
            retDataItem = self.array[self.index]
            self.index += 1
            return retDataItem
        else:
            self.index = 0  # ❶
            raise StopIteration

container = linearTypeContainer(list(range(5)))

for item in container:
    print(item)

for item in container:
    print(item)

for item in container:
    print(item)

However, there are problems in doing so under certain special circumstances:

container = linearTypeContainer(list(range(5)))

for item in container:
    print(item)
    if container.index == 3:
        break

print("*"*20)

for item in container: 
    print(item)

# 0
# 1
# 2
# ********************
# 3
# 4

You will find that if the first for loop exits at 1.5, the second for loop will continue according to the first for loop.

Can you solve it? Just add a flag bit:

class linearTypeContainer:
    def __init__(self, array):
        if isinstance(array, list) or isinstance(array, tuple):
            self.array = array
        else:
            raise TypeError("argument array must is linear container")

        self.index = 0
        self.len = len(self.array)
        self.iter = False # ❶

    def __iter__(self):
        if self.iter: # ❷
            self.index = 0
        self.iter = True
        return self

    def __next__(self):
        if self.index < self.len:
            retDataItem = self.array[self.index]
            self.index += 1
            return retDataItem
        else:
            self.index = 0
            raise StopIteration


container = linearTypeContainer(list(range(5)))

for item in container:
    print(item)
    if container.index == 3:
        break

print("*" * 20)

for item in container:
    print(item)

# 0
# 1
# 2
# ********************
# 0
# 1
# 2
# 3
# 4

❶: judge whether it is a new call

❷: if it is a new call, reset the index to 0

So why isn't Python designed this way? We should think more about whether multiple for loops use the same iterator in the case of multithreading. In the above example, the shared iterator is not thread safe. In addition, it does not support nested loops, as shown below, which will lead to unlimited loops:

container = linearTypeContainer(list(range(5)))

for item in container:
    print(item)
    for j in container:
        print(j)

To sum up, Python sets the built-in data types to return the exclusive iterator during the for loop, which is a very good design. However, for some built-in objects, it makes itself an iterator, such as file objects.

② : isn't there any benefit to the early evaluation object returned in Python 2? Is it really a waste of memory?

A: No, you can find that all the early evaluation objects returned in Python 3 do not support index operation, but the list returned in Python 2 can support index operation. In some extreme cases, this is indeed an advantage, but does the lazy evaluation object of Python 3 need this advantage? Don't you convert it to list manually? This provides you with more operability and optimizes the memory occupation. Why not?

③ : can you implement an object that returns lazy evaluation?

A: Yes! You see, I implement a Range. In fact, the position of parameter passing is different from that of the built-in, but it is thread safe and supports nested loops:

class Range:
    def __init__(self, stop, start=0, step=1):
        self.start = start
        self.stop = stop
        self.step = step
        self.current = None

    def __iter__(self):
        return Range_iterator(self.stop, self.start, self.step)


class Range_iterator:
    def __init__(self, stop, start, step):
        self.start = start
        self.stop = stop
        self.step = step
        self.current = self.start

    def __next__(self):
        if self.current < self.stop:
            retDataItem = self.current
            self.current += self.step
            return retDataItem
        raise StopIteration


for i in Range(10):
    print(i)
    for j in Range(10):
        print(j)

Added by msinternet on Wed, 09 Feb 2022 20:11:47 +0200