How does the onlooker write a Python loop and use the memory to the extreme?

0 Preface

When it comes to processing loops, we are used to using for, while, etc., for example, printing the characters in each list in turn:

 

lis = ['I', 'love', 'python']
for i in lis:
    print(i)
I
love
python

When the number of bytes of the printed content is small, it can be printed after all the bytes are loaded into memory. There is no problem. However, if there are tens of millions of vehicle driving tracks, ask you to analyze the travel rules and traffic jams of each of them. If you are dealing with it on a single machine.

You may have to face it first, or you may neglect it. Finally, after the code is written, you may expose a problem: out of memory, which is often encountered in actual projects.

 

 

This problem reminds us that when dealing with data, it is very important to write programs that use memory efficiently. Today, we are going to explore how to use memory efficiently, save memory and do things well.

In fact, Python has prepared a module to deal with this issue. It is the itertools module. The functions of these functions are well understood.

I'm not going to give a general introduction to the functions they can realize, but I want to analyze the implementation code behind these functions. How can they save memory efficiently? How can the contributors of Python kernel write beautiful code? It's interesting, isn't it?

OK,let's go. Hope you enjoy the journey!

1 splicing element

The chain function in itertools implements element splicing. The prototype is as follows. The parameter * represents a variable number of parameters

chain(iterables)

The application is as follows:

 

In [33]: list(chain(['I','love'],['python'],['very', 'much']))
Out[33]: ['I', 'love', 'python', 'very', 'much']

Wow, it's not easy to use any more. It's a bit of join, but it's better than join. Its key point is that the parameters are all iterative instances.

So, how can chain achieve efficient memory saving? The approximate implementation code of chain is as follows:

 

def chain(*iterables):
    for it in iterables:
        for element in it:
            yield element

The above code is not difficult to understand. chain essentially returns a generator, so it actually reads one element into memory at a time, so it saves the most memory.

2 accumulate one by one

Returns the cumulative total value of the list, prototype:

accumulate(iterable[, func, *, initial=None])

The application is as follows:

 

In [36]: list(accumulate([1,2,3,4,5,6],lambda x,y: x*y))
Out[36]: [1, 2, 6, 24, 120, 720]

The approximate implementation code of accumulate is as follows:

 

def accumulate(iterable, func=operator.add, *, initial=None):
    it = iter(iterable)
    total = initial
    if initial is None:
        try:
            total = next(it)
        except StopIteration:
            return
    yield total
    for element in it:
        total = func(total, element)
        yield total

Above code, are you ok? Unlike the simple yield of chain, which is a little more complicated here, yield is a bit like return, so the line yield total directly returns an element, that is, the first element of iterable, because the first element returned by this function is the first one at any time. And because yield returns a generator object, such as the name gen, when next(gen), the code will execute to for element in it: line. At this time, iterator it has pointed to the second element of iterable. OK, I believe you understand!

3 funnel screening

It is a compress function, similar to funnel function, so I call it funnel filtering, prototype:

compress(data, selectors)

 

In [38]: list(compress('abcdefg',[1,1,0,1]))
Out[38]: ['a', 'b', 'd']

It is easy to see that the number of elements returned by compress is equal to the shorter list length of the two parameters.

Its general implementation code:

 

def compress(data, selectors):
    return (d for d, s in zip(data, selectors) if s)

This function is very easy to use

4-level screening

Scan the list. If the conditions are not met, start to keep it back. The prototype is as follows:

dropwhile(predicate, iterable)

Application example:

 

In [39]: list(dropwhile(lambda x: x<3,[1,0,2,4,1,1,3,5,-5]))
Out[39]: [4, 1, 1, 3, 5, -5]

The code to realize it is as follows:

 

def dropwhile(predicate, iterable):
    iterable = iter(iterable)
    for x in iterable:
        if not predicate(x):
            yield x
            break
    for x in iterable:
        yield x

5-level screening 2

Scan the list and return the elements from the iteratable objects as long as the conditions are met, until the conditions are not met. The prototype is as follows:

takewhile(predicate, iterable)

Application example:

 

In [43]: list(takewhile(lambda x: x<5, [1,4,6,4,1]))
Out[43]: [1, 4]

The code to realize it is as follows:

 

def takewhile(predicate, iterable):
    for x in iterable:
        if predicate(x):
            yield x
        else:
            break #Return immediately

6. Defective product screening

Scan the list and keep it as long as the conditions are not met. The prototype is as follows:

dropwhile(predicate, iterable)

Application example:

 

In [40]: list(filterfalse(lambda x: x%2==0, [1,2,3,4,5,6]))
Out[40]: [1, 3, 5]

The approximate code to realize it is as follows:

 

def dropwhile(predicate, iterable):
    iterable = iter(iterable)
    for x in iterable:
        if not predicate(x):
            yield x
            break
    for x in iterable:
        yield x

7 slice screening

Common slice operations in Python, such as:

 

lis = [1,3,2,1]
lis[:1]

Their defect is that lis must be loaded into memory, so they can save more memory. The prototype is as follows:

islice(iterable, start, stop[, step])

Application example:

 

In [41]: list(islice('abcdefg',1,4,2))
Out[41]: ['b', 'd']

The code to realize it is as follows:

 

def islice(iterable, *args):
    s = slice(*args)
    start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
    it = iter(range(start, stop, step))
    try:
        nexti = next(it)
    except StopIteration:
        for i, element in zip(range(start), iterable):
            pass
        return
    try:
        for i, element in enumerate(iterable):
            if i == nexti:
                yield element
                nexti = next(it)
    except StopIteration:
        for i, element in zip(range(i + 1, stop), iterable):
            pass

Cleverly use the generator to throw an exception StopIteration at the end of iteration to do some boundary processing.

8 cell division

The tee function is similar to the well-known cell division. It can replicate n original iterators, and its prototype is as follows:

tee(iterable, n=2)

Using the following, we can see that the two iterators are independent

 

a = tee([1,4,6,4,1],2)
In [51]: next(a[0])
Out[51]: 1

In [52]: next(a[1])
Out[52]: 1

The code to implement it is as follows:

 

def tee(iterable, n=2):
    it = iter(iterable)
    deques = [collections.deque() for i in range(n)]
    def gen(mydeque):
        while True:
            if not mydeque:            
                try:
                    newval = next(it)   
                except StopIteration:
                    return
                for d in deques:     
                    d.append(newval)
            yield mydeque.popleft()
    return tuple(gen(d) for d in deques)

The tee implementation internally uses a queue type deques, which initially generates an empty queue, and adds the element newval to each copied queue. At the same time, yield the leftmost element in the currently called mydeque.

9 map variants

starmap can be seen as a variant of map, which can save more memory. At the same time, the elements of iterable must also be iterative objects. The prototype is as follows:

starmap(function, iterable)

Apply it:

 

In [63]: list(starmap(lambda x,y: str(x)+'-'+str(y), [('a',1),('b',2),('c',3)]))
Out[63]: ['a-1', 'b-2', 'c-3']

The implementation details of starmap are as follows:

 

def starmap(function, iterable):
    for args in iterable:
        yield function(*args)

10 copy elements

repeat implements copying elements n times. The prototype is as follows:

repeat(object[, times])

The application is as follows:

 

In [66]: list(repeat(6,3))
Out[66]: [6, 6, 6]

In [67]: list(repeat([1,2,3],2))
Out[67]: [[1, 2, 3], [1, 2, 3]]

Its implementation details are as follows:

 

def repeat(object, times=None):
    if times is None:#If times is not set, it will continue to repeat
        while True: 
            yield object
    else:
        for i in range(times):
            yield object

11 Cartesian product

The effect of Cartesian product is the same as the following:

 

 ((x,y) for x in A for y in B)

Therefore, the realization effect of Cartesian product is as follows:

 

In [68]: list(product('ABCD', 'xy'))
Out[68]:
[('A', 'x'),
 ('A', 'y'),
 ('B', 'x'),
 ('B', 'y'),
 ('C', 'x'),
 ('C', 'y'),
 ('D', 'x'),
 ('D', 'y')]

Its implementation details:

 

def product(*args, repeat=1):
    pools = [tuple(pool) for pool in args] * repeat
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)

12 enhanced zip

Combinatorial value. If the length of the iteratable object is not aligned, the missing value will be filled according to the fillvalue. Note: the iteration continues to the iteratable object with the longest consumption, and the effect is as follows:

 

In [69]: list(zip_longest('ABCD', 'xy', fillvalue='-'))
Out[69]: [('A', 'x'), ('B', 'y'), ('C', '-'), ('D', '-')]

Its implementation details:

 

def zip_longest(*args, fillvalue=None):
    iterators = [iter(it) for it in args]
    num_active = len(iterators)
    if not num_active:
        return
    while True:
        values = []
        for i, it in enumerate(iterators):
            try:
                value = next(it)
            except StopIteration:
                num_active -= 1
                if not num_active:
                    return
                iterators[i] = repeat(fillvalue)
                value = fillvalue
            values.append(value)
        yield tuple(values)

It uses repeat, which is to fill in the missing value according to the fillvalue when the length of the iteratable object is not aligned. The key to understanding the above code is the particularity of iterator object and next method:

 

In [74]: for i, it in enumerate([iter([1,2,3]),iter(['x','y'])]):
    ...:     print(next(it))
    #Output:
    1
    x

Combined with this prompt to understand the above code, it will not be difficult.

summary

The itertools module of Python provides an efficient memory saving iterator, in which the implementation is basically based on generators. On the one hand, understanding the basic functions of these 12 functions can deepen the understanding of generators, and lay a solid foundation for us to write more efficient, concise and beautiful code.

0 Preface

When it comes to processing loops, we are used to using for, while, etc., for example, printing the characters in each list in turn:

lis = ['I''love''python']
for i in lis:
    print(i)
I
love
python

When the number of bytes of the printed content is small, it can be printed after all the bytes are loaded into memory. There is no problem. However, if there are tens of millions of vehicle driving tracks, ask you to analyze the travel rules and traffic jams of each of them. If you are dealing with it on a single machine.

You may have to face it first, or you may neglect it. Finally, after the code is written, you may expose a problem: out of memory, which is often encountered in actual projects.

This problem reminds us that when dealing with data, it is very important to write programs that use memory efficiently. Today, we are going to explore how to use memory efficiently, save memory and do things well.

In fact, Python has prepared a module to deal with this issue. It is the itertools module. The functions of these functions are well understood.

I'm not going to give a general introduction to the functions they can realize, but I want to analyze the implementation code behind these functions. How can they save memory efficiently? How can the contributors of Python kernel write beautiful code? It's interesting, isn't it?

OK,let's go. Hope you enjoy the journey!

1 splicing element

The chain function in itertools implements element splicing. The prototype is as follows. The parameter * represents a variable number of parameters

chain(iterables)

The application is as follows:

In [33]: list(chain(['I','love'],['python'],['very''much']))
Out[33]: ['I''love''python''very''much']

Wow, it's not easy to use any more. It's a bit of join, but it's better than join. Its key point is that the parameters are all iterative instances.

So, how can chain achieve efficient memory saving? The approximate implementation code of chain is as follows:

def chain(*iterables):
    for it in iterables:
        for element in it:
            yield element

The above code is not difficult to understand. chain essentially returns a generator, so it actually reads one element into memory at a time, so it saves the most memory.

2 accumulate one by one

Returns the cumulative total value of the list, prototype:

accumulate(iterable[, func, *, initial=None])

The application is as follows:

In [36]list(accumulate([1,2,3,4,5,6],lambda x,yx*y))
Out[36][1, 2, 6, 24, 120, 720]

The approximate implementation code of accumulate is as follows:

def accumulate(iterable, func=operator.add, *, initial=None):
    it = iter(iterable)
    total = initial
    if initial is None:
        try:
            total = next(it)
        except StopIteration:
            return
    yield total
    for element in it:
        total = func(total, element)
        yield total

Above code, are you ok? Unlike the simple yield of chain, which is a little more complicated here, yield is a bit like return, so the line yield total directly returns an element, that is, the first element of iterable, because the first element returned by this function is the first one at any time. And because yield returns a generator object, such as the name gen, when next(gen), the code will execute to for element in it: line. At this time, iterator it has pointed to the second element of iterable. OK, I believe you understand!

3 funnel screening

It is a compress function, similar to funnel function, so I call it funnel filtering, prototype:

compress(data, selectors)

In [38]: list(compress('abcdefg',[1,1,0,1]))
Out[38]: ['a''b''d']

It is easy to see that the number of elements returned by compress is equal to the shorter list length of the two parameters.

Its general implementation code:

def compress(data, selectors):
    return (d for d, s in zip(data, selectors) if s)

This function is very easy to use

4-level screening

Scan the list. If the conditions are not met, start to keep it back. The prototype is as follows:

dropwhile(predicate, iterable)

Application example:

In [39]list(dropwhile(lambda xx<3,[1,0,2,4,1,1,3,5,-5]))
Out[39][4, 1, 1, 3, 5, -5]

The code to realize it is as follows:

def dropwhile(predicate, iterable):
    iterable = iter(iterable)
    for x in iterable:
        if not predicate(x):
            yield x
            break
    for x in iterable:
        yield x

5-level screening 2

Scan the list and return the elements from the iteratable objects as long as the conditions are met, until the conditions are not met. The prototype is as follows:

takewhile(predicate, iterable)

Application example:

In [43]list(takewhile(lambda xx<5, [1,4,6,4,1]))
Out[43][1, 4]

The code to realize it is as follows:

def takewhile(predicate, iterable):
    for x in iterable:
        if predicate(x):
            yield x
        else:
            break #Return immediately

6. Defective product screening

Scan the list and keep it as long as the conditions are not met. The prototype is as follows:

dropwhile(predicate, iterable)

Application example:

In [40]: list(filterfalse(lambda x: x%2==0, [1,2,3,4,5,6]))
Out[40]: [135]

The code to realize it is as follows:

def dropwhile(predicate, iterable):
    iterable = iter(iterable)
    for x in iterable:
        if not predicate(x):
            yield x
            break
    for x in iterable:
        yield x

7 slice screening

Common slice operations in Python, such as:

lis = [1,3,2,1]
lis[:1]

Their defect is that lis must be loaded into memory, so they can save more memory. The prototype is as follows:

islice(iterable, start, stop[, step])

Application example:

In [41]: list(islice('abcdefg',1,4,2))
Out[41]: ['b''d']

The code to realize it is as follows:

def islice(iterable, *args):
    s = slice(*args)
    start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
    it = iter(range(start, stop, step))
    try:
        nexti = next(it)
    except StopIteration:
        for i, element in zip(range(start), iterable):
            pass
        return
    try:
        for i, element in enumerate(iterable):
            if i == nexti:
                yield element
                nexti = next(it)
    except StopIteration:
        for i, element in zip(range(i + 1, stop), iterable):
            pass

Cleverly use the generator to throw an exception StopIteration at the end of iteration to do some boundary processing.

8 cell division

The tee function is similar to the well-known cell division. It can replicate n original iterators, and its prototype is as follows:

tee(iterable, n=2)

Using the following, we can see that the two iterators are independent

a = tee([1,4,6,4,1],2)
In [51]: next(a[0])
Out[51]: 1

In [52]: next(a[1])
Out[52]: 1

The code to implement it is as follows:

def tee(iterable, n=2):
    it = iter(iterable)
    deques = [collections.deque() for i in range(n)]
    def gen(mydeque):
        while True:
            if not mydeque:            
                try:
                    newval = next(it)   
                except StopIteration:
                    return
                for d in deques:     
                    d.append(newval)
            yield mydeque.popleft()
    return tuple(gen(d) for d in deques)

The tee implementation internally uses a queue type deques, which initially generates an empty queue, and adds the element newval to each copied queue. At the same time, yield the leftmost element in the currently called mydeque.

9 map variants

starmap can be seen as a variant of map, which can save more memory. At the same time, the elements of iterable must also be iterative objects. The prototype is as follows:

starmap(function, iterable)

Apply it:

In [63]: list(starmap(lambda x,y: str(x)+'-'+str(y), [('a',1),('b',2),('c',3)]))
Out[63]: ['a-1''b-2''c-3']

The implementation details of starmap are as follows:

def starmap(functioniterable):
    for args in iterable:
        yield function(*args)

10 copy elements

repeat implements copying elements n times. The prototype is as follows:

repeat(object[, times])

The application is as follows:

In [66]list(repeat(6,3))
Out[66][6, 6, 6]

In [67]list(repeat([1,2,3],2))
Out[67][[1, 2, 3][1, 2, 3]]

Its implementation details are as follows:

def repeat(object, times=None):
    if times is None:# If times Do not set, will always repeat down
        while True
            yield object
    else:
        for i in range(times):
            yield object

11 Cartesian product

The effect of Cartesian product is the same as the following:

 ((x,y) for x in A for y in B)

Therefore, the realization effect of Cartesian product is as follows:

In [68]: list(product('ABCD''xy'))
Out[68]:
[('A''x'),
 ('A''y'),
 ('B''x'),
 ('B''y'),
 ('C''x'),
 ('C''y'),
 ('D''x'),
 ('D''y')]

Its implementation details:

def product(*args, repeat=1):
    pools = [tuple(pool) for pool in args] * repeat
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)

12 enhanced zip

Combinatorial value. If the length of the iteratable object is not aligned, the missing value will be filled according to the fillvalue. Note: the iteration continues to the iteratable object with the longest consumption, and the effect is as follows:

In [69]: list(zip_longest('ABCD''xy', fillvalue='-'))
Out[69]: [('A''x'), ('B''y'), ('C''-'), ('D''-')]

Its implementation details:

def zip_longest(*args, fillvalue=None):
    iterators = [iter(it) for it in args]
    num_active = len(iterators)
    if not num_active:
        return
    while True:
        values = []
        for i, it in enumerate(iterators):
            try:
                value = next(it)
            except StopIteration:
                num_active -= 1
                if not num_active:
                    return
                iterators[i] = repeat(fillvalue)
                value = fillvalue
            values.append(value)
        yield tuple(values)

It uses repeat, which is to fill in the missing value according to the fillvalue when the length of the iteratable object is not aligned. The key to understanding the above code is the particularity of iterator object and next method:

In [74]: for i, it in enumerate([iter([1,2,3]),iter(['x','y'])]):
    ...:     print(next(it))
    #Output:
    1
    x

Combined with this prompt to understand the above code, it will not be difficult.

summary

The itertools module of Python provides an efficient memory saving iterator, in which the implementation is basically based on generators. On the one hand, understanding the basic functions of these 12 functions can deepen the understanding of generators, and lay a solid foundation for us to write more efficient, concise and beautiful code.

Keywords: Python Lambda

Added by gurechan on Wed, 11 Dec 2019 03:57:18 +0200