0 Preface
When it comes to processing loops, we are used to using for, while, etc., for example, printing the characters in each list in turn:
lis = ['I', 'love', 'python'] for i in lis: print(i) I love python
When the number of bytes of the printed content is small, it can be printed after all the bytes are loaded into memory. There is no problem. However, if there are tens of millions of vehicle driving tracks, ask you to analyze the travel rules and traffic jams of each of them. If you are dealing with it on a single machine.
You may have to face it first, or you may neglect it. Finally, after the code is written, you may expose a problem: out of memory, which is often encountered in actual projects.
This problem reminds us that when dealing with data, it is very important to write programs that use memory efficiently. Today, we are going to explore how to use memory efficiently, save memory and do things well.
In fact, Python has prepared a module to deal with this issue. It is the itertools module. The functions of these functions are well understood.
I'm not going to give a general introduction to the functions they can realize, but I want to analyze the implementation code behind these functions. How can they save memory efficiently? How can the contributors of Python kernel write beautiful code? It's interesting, isn't it?
OK,let's go. Hope you enjoy the journey!
1 splicing element
The chain function in itertools implements element splicing. The prototype is as follows. The parameter * represents a variable number of parameters
chain(iterables)
The application is as follows:
In [33]: list(chain(['I','love'],['python'],['very', 'much'])) Out[33]: ['I', 'love', 'python', 'very', 'much']
Wow, it's not easy to use any more. It's a bit of join, but it's better than join. Its key point is that the parameters are all iterative instances.
So, how can chain achieve efficient memory saving? The approximate implementation code of chain is as follows:
def chain(*iterables): for it in iterables: for element in it: yield element
The above code is not difficult to understand. chain essentially returns a generator, so it actually reads one element into memory at a time, so it saves the most memory.
2 accumulate one by one
Returns the cumulative total value of the list, prototype:
accumulate(iterable[, func, *, initial=None])
The application is as follows:
In [36]: list(accumulate([1,2,3,4,5,6],lambda x,y: x*y)) Out[36]: [1, 2, 6, 24, 120, 720]
The approximate implementation code of accumulate is as follows:
def accumulate(iterable, func=operator.add, *, initial=None): it = iter(iterable) total = initial if initial is None: try: total = next(it) except StopIteration: return yield total for element in it: total = func(total, element) yield total
Above code, are you ok? Unlike the simple yield of chain, which is a little more complicated here, yield is a bit like return, so the line yield total directly returns an element, that is, the first element of iterable, because the first element returned by this function is the first one at any time. And because yield returns a generator object, such as the name gen, when next(gen), the code will execute to for element in it: line. At this time, iterator it has pointed to the second element of iterable. OK, I believe you understand!
3 funnel screening
It is a compress function, similar to funnel function, so I call it funnel filtering, prototype:
compress(data, selectors)
In [38]: list(compress('abcdefg',[1,1,0,1])) Out[38]: ['a', 'b', 'd']
It is easy to see that the number of elements returned by compress is equal to the shorter list length of the two parameters.
Its general implementation code:
def compress(data, selectors): return (d for d, s in zip(data, selectors) if s)
This function is very easy to use
4-level screening
Scan the list. If the conditions are not met, start to keep it back. The prototype is as follows:
dropwhile(predicate, iterable)
Application example:
In [39]: list(dropwhile(lambda x: x<3,[1,0,2,4,1,1,3,5,-5])) Out[39]: [4, 1, 1, 3, 5, -5]
The code to realize it is as follows:
def dropwhile(predicate, iterable): iterable = iter(iterable) for x in iterable: if not predicate(x): yield x break for x in iterable: yield x
5-level screening 2
Scan the list and return the elements from the iteratable objects as long as the conditions are met, until the conditions are not met. The prototype is as follows:
takewhile(predicate, iterable)
Application example:
In [43]: list(takewhile(lambda x: x<5, [1,4,6,4,1])) Out[43]: [1, 4]
The code to realize it is as follows:
def takewhile(predicate, iterable): for x in iterable: if predicate(x): yield x else: break #Return immediately
6. Defective product screening
Scan the list and keep it as long as the conditions are not met. The prototype is as follows:
dropwhile(predicate, iterable)
Application example:
In [40]: list(filterfalse(lambda x: x%2==0, [1,2,3,4,5,6])) Out[40]: [1, 3, 5]
The approximate code to realize it is as follows:
def dropwhile(predicate, iterable): iterable = iter(iterable) for x in iterable: if not predicate(x): yield x break for x in iterable: yield x
7 slice screening
Common slice operations in Python, such as:
lis = [1,3,2,1] lis[:1]
Their defect is that lis must be loaded into memory, so they can save more memory. The prototype is as follows:
islice(iterable, start, stop[, step])
Application example:
In [41]: list(islice('abcdefg',1,4,2)) Out[41]: ['b', 'd']
The code to realize it is as follows:
def islice(iterable, *args): s = slice(*args) start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1 it = iter(range(start, stop, step)) try: nexti = next(it) except StopIteration: for i, element in zip(range(start), iterable): pass return try: for i, element in enumerate(iterable): if i == nexti: yield element nexti = next(it) except StopIteration: for i, element in zip(range(i + 1, stop), iterable): pass
Cleverly use the generator to throw an exception StopIteration at the end of iteration to do some boundary processing.
8 cell division
The tee function is similar to the well-known cell division. It can replicate n original iterators, and its prototype is as follows:
tee(iterable, n=2)
Using the following, we can see that the two iterators are independent
a = tee([1,4,6,4,1],2) In [51]: next(a[0]) Out[51]: 1 In [52]: next(a[1]) Out[52]: 1
The code to implement it is as follows:
def tee(iterable, n=2): it = iter(iterable) deques = [collections.deque() for i in range(n)] def gen(mydeque): while True: if not mydeque: try: newval = next(it) except StopIteration: return for d in deques: d.append(newval) yield mydeque.popleft() return tuple(gen(d) for d in deques)
The tee implementation internally uses a queue type deques, which initially generates an empty queue, and adds the element newval to each copied queue. At the same time, yield the leftmost element in the currently called mydeque.
9 map variants
starmap can be seen as a variant of map, which can save more memory. At the same time, the elements of iterable must also be iterative objects. The prototype is as follows:
starmap(function, iterable)
Apply it:
In [63]: list(starmap(lambda x,y: str(x)+'-'+str(y), [('a',1),('b',2),('c',3)])) Out[63]: ['a-1', 'b-2', 'c-3']
The implementation details of starmap are as follows:
def starmap(function, iterable): for args in iterable: yield function(*args)
10 copy elements
repeat implements copying elements n times. The prototype is as follows:
repeat(object[, times])
The application is as follows:
In [66]: list(repeat(6,3)) Out[66]: [6, 6, 6] In [67]: list(repeat([1,2,3],2)) Out[67]: [[1, 2, 3], [1, 2, 3]]
Its implementation details are as follows:
def repeat(object, times=None): if times is None:#If times is not set, it will continue to repeat while True: yield object else: for i in range(times): yield object
11 Cartesian product
The effect of Cartesian product is the same as the following:
((x,y) for x in A for y in B)
Therefore, the realization effect of Cartesian product is as follows:
In [68]: list(product('ABCD', 'xy')) Out[68]: [('A', 'x'), ('A', 'y'), ('B', 'x'), ('B', 'y'), ('C', 'x'), ('C', 'y'), ('D', 'x'), ('D', 'y')]
Its implementation details:
def product(*args, repeat=1): pools = [tuple(pool) for pool in args] * repeat result = [[]] for pool in pools: result = [x+[y] for x in result for y in pool] for prod in result: yield tuple(prod)
12 enhanced zip
Combinatorial value. If the length of the iteratable object is not aligned, the missing value will be filled according to the fillvalue. Note: the iteration continues to the iteratable object with the longest consumption, and the effect is as follows:
In [69]: list(zip_longest('ABCD', 'xy', fillvalue='-')) Out[69]: [('A', 'x'), ('B', 'y'), ('C', '-'), ('D', '-')]
Its implementation details:
def zip_longest(*args, fillvalue=None): iterators = [iter(it) for it in args] num_active = len(iterators) if not num_active: return while True: values = [] for i, it in enumerate(iterators): try: value = next(it) except StopIteration: num_active -= 1 if not num_active: return iterators[i] = repeat(fillvalue) value = fillvalue values.append(value) yield tuple(values)
It uses repeat, which is to fill in the missing value according to the fillvalue when the length of the iteratable object is not aligned. The key to understanding the above code is the particularity of iterator object and next method:
In [74]: for i, it in enumerate([iter([1,2,3]),iter(['x','y'])]): ...: print(next(it)) #Output: 1 x
Combined with this prompt to understand the above code, it will not be difficult.
summary
The itertools module of Python provides an efficient memory saving iterator, in which the implementation is basically based on generators. On the one hand, understanding the basic functions of these 12 functions can deepen the understanding of generators, and lay a solid foundation for us to write more efficient, concise and beautiful code.
0 Preface
When it comes to processing loops, we are used to using for, while, etc., for example, printing the characters in each list in turn:
lis = ['I', 'love', 'python']
for i in lis:
print(i)
I
love
python
When the number of bytes of the printed content is small, it can be printed after all the bytes are loaded into memory. There is no problem. However, if there are tens of millions of vehicle driving tracks, ask you to analyze the travel rules and traffic jams of each of them. If you are dealing with it on a single machine.
You may have to face it first, or you may neglect it. Finally, after the code is written, you may expose a problem: out of memory, which is often encountered in actual projects.
This problem reminds us that when dealing with data, it is very important to write programs that use memory efficiently. Today, we are going to explore how to use memory efficiently, save memory and do things well.
In fact, Python has prepared a module to deal with this issue. It is the itertools module. The functions of these functions are well understood.
I'm not going to give a general introduction to the functions they can realize, but I want to analyze the implementation code behind these functions. How can they save memory efficiently? How can the contributors of Python kernel write beautiful code? It's interesting, isn't it?
OK,let's go. Hope you enjoy the journey!
1 splicing element
The chain function in itertools implements element splicing. The prototype is as follows. The parameter * represents a variable number of parameters
chain(iterables)
The application is as follows:
In [33]: list(chain(['I','love'],['python'],['very', 'much']))
Out[33]: ['I', 'love', 'python', 'very', 'much']
Wow, it's not easy to use any more. It's a bit of join, but it's better than join. Its key point is that the parameters are all iterative instances.
So, how can chain achieve efficient memory saving? The approximate implementation code of chain is as follows:
def chain(*iterables):
for it in iterables:
for element in it:
yield element
The above code is not difficult to understand. chain essentially returns a generator, so it actually reads one element into memory at a time, so it saves the most memory.
2 accumulate one by one
Returns the cumulative total value of the list, prototype:
accumulate(iterable[, func, *, initial=None])
The application is as follows:
In [36]: list(accumulate([1,2,3,4,5,6],lambda x,y: x*y))
Out[36]: [1, 2, 6, 24, 120, 720]
The approximate implementation code of accumulate is as follows:
def accumulate(iterable, func=operator.add, *, initial=None):
it = iter(iterable)
total = initial
if initial is None:
try:
total = next(it)
except StopIteration:
return
yield total
for element in it:
total = func(total, element)
yield total
Above code, are you ok? Unlike the simple yield of chain, which is a little more complicated here, yield is a bit like return, so the line yield total directly returns an element, that is, the first element of iterable, because the first element returned by this function is the first one at any time. And because yield returns a generator object, such as the name gen, when next(gen), the code will execute to for element in it: line. At this time, iterator it has pointed to the second element of iterable. OK, I believe you understand!
3 funnel screening
It is a compress function, similar to funnel function, so I call it funnel filtering, prototype:
compress(data, selectors)
In [38]: list(compress('abcdefg',[1,1,0,1]))
Out[38]: ['a', 'b', 'd']
It is easy to see that the number of elements returned by compress is equal to the shorter list length of the two parameters.
Its general implementation code:
def compress(data, selectors):
return (d for d, s in zip(data, selectors) if s)
This function is very easy to use
4-level screening
Scan the list. If the conditions are not met, start to keep it back. The prototype is as follows:
dropwhile(predicate, iterable)
Application example:
In [39]: list(dropwhile(lambda x: x<3,[1,0,2,4,1,1,3,5,-5]))
Out[39]: [4, 1, 1, 3, 5, -5]
The code to realize it is as follows:
def dropwhile(predicate, iterable):
iterable = iter(iterable)
for x in iterable:
if not predicate(x):
yield x
break
for x in iterable:
yield x
5-level screening 2
Scan the list and return the elements from the iteratable objects as long as the conditions are met, until the conditions are not met. The prototype is as follows:
takewhile(predicate, iterable)
Application example:
In [43]: list(takewhile(lambda x: x<5, [1,4,6,4,1]))
Out[43]: [1, 4]
The code to realize it is as follows:
def takewhile(predicate, iterable):
for x in iterable:
if predicate(x):
yield x
else:
break #Return immediately
6. Defective product screening
Scan the list and keep it as long as the conditions are not met. The prototype is as follows:
dropwhile(predicate, iterable)
Application example:
In [40]: list(filterfalse(lambda x: x%2==0, [1,2,3,4,5,6]))
Out[40]: [1, 3, 5]
The code to realize it is as follows:
def dropwhile(predicate, iterable):
iterable = iter(iterable)
for x in iterable:
if not predicate(x):
yield x
break
for x in iterable:
yield x
7 slice screening
Common slice operations in Python, such as:
lis = [1,3,2,1]
lis[:1]
Their defect is that lis must be loaded into memory, so they can save more memory. The prototype is as follows:
islice(iterable, start, stop[, step])
Application example:
In [41]: list(islice('abcdefg',1,4,2))
Out[41]: ['b', 'd']
The code to realize it is as follows:
def islice(iterable, *args):
s = slice(*args)
start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
it = iter(range(start, stop, step))
try:
nexti = next(it)
except StopIteration:
for i, element in zip(range(start), iterable):
pass
return
try:
for i, element in enumerate(iterable):
if i == nexti:
yield element
nexti = next(it)
except StopIteration:
for i, element in zip(range(i + 1, stop), iterable):
pass
Cleverly use the generator to throw an exception StopIteration at the end of iteration to do some boundary processing.
8 cell division
The tee function is similar to the well-known cell division. It can replicate n original iterators, and its prototype is as follows:
tee(iterable, n=2)
Using the following, we can see that the two iterators are independent
a = tee([1,4,6,4,1],2)
In [51]: next(a[0])
Out[51]: 1
In [52]: next(a[1])
Out[52]: 1
The code to implement it is as follows:
def tee(iterable, n=2):
it = iter(iterable)
deques = [collections.deque() for i in range(n)]
def gen(mydeque):
while True:
if not mydeque:
try:
newval = next(it)
except StopIteration:
return
for d in deques:
d.append(newval)
yield mydeque.popleft()
return tuple(gen(d) for d in deques)
The tee implementation internally uses a queue type deques, which initially generates an empty queue, and adds the element newval to each copied queue. At the same time, yield the leftmost element in the currently called mydeque.
9 map variants
starmap can be seen as a variant of map, which can save more memory. At the same time, the elements of iterable must also be iterative objects. The prototype is as follows:
starmap(function, iterable)
Apply it:
In [63]: list(starmap(lambda x,y: str(x)+'-'+str(y), [('a',1),('b',2),('c',3)]))
Out[63]: ['a-1', 'b-2', 'c-3']
The implementation details of starmap are as follows:
def starmap(function, iterable):
for args in iterable:
yield function(*args)
10 copy elements
repeat implements copying elements n times. The prototype is as follows:
repeat(object[, times])
The application is as follows:
In [66]: list(repeat(6,3))
Out[66]: [6, 6, 6]
In [67]: list(repeat([1,2,3],2))
Out[67]: [[1, 2, 3], [1, 2, 3]]
Its implementation details are as follows:
def repeat(object, times=None):
if times is None:# If times Do not set, will always repeat down
while True:
yield object
else:
for i in range(times):
yield object
11 Cartesian product
The effect of Cartesian product is the same as the following:
((x,y) for x in A for y in B)
Therefore, the realization effect of Cartesian product is as follows:
In [68]: list(product('ABCD', 'xy'))
Out[68]:
[('A', 'x'),
('A', 'y'),
('B', 'x'),
('B', 'y'),
('C', 'x'),
('C', 'y'),
('D', 'x'),
('D', 'y')]
Its implementation details:
def product(*args, repeat=1):
pools = [tuple(pool) for pool in args] * repeat
result = [[]]
for pool in pools:
result = [x+[y] for x in result for y in pool]
for prod in result:
yield tuple(prod)
12 enhanced zip
Combinatorial value. If the length of the iteratable object is not aligned, the missing value will be filled according to the fillvalue. Note: the iteration continues to the iteratable object with the longest consumption, and the effect is as follows:
In [69]: list(zip_longest('ABCD', 'xy', fillvalue='-'))
Out[69]: [('A', 'x'), ('B', 'y'), ('C', '-'), ('D', '-')]
Its implementation details:
def zip_longest(*args, fillvalue=None):
iterators = [iter(it) for it in args]
num_active = len(iterators)
if not num_active:
return
while True:
values = []
for i, it in enumerate(iterators):
try:
value = next(it)
except StopIteration:
num_active -= 1
if not num_active:
return
iterators[i] = repeat(fillvalue)
value = fillvalue
values.append(value)
yield tuple(values)
It uses repeat, which is to fill in the missing value according to the fillvalue when the length of the iteratable object is not aligned. The key to understanding the above code is the particularity of iterator object and next method:
In [74]: for i, it in enumerate([iter([1,2,3]),iter(['x','y'])]):
...: print(next(it))
#Output:
1
x
Combined with this prompt to understand the above code, it will not be difficult.
summary
The itertools module of Python provides an efficient memory saving iterator, in which the implementation is basically based on generators. On the one hand, understanding the basic functions of these 12 functions can deepen the understanding of generators, and lay a solid foundation for us to write more efficient, concise and beautiful code.