Reading notes of fluent Python Second Edition - sequence array

introduction

Pthon uses a unified style to process sequence data. No matter what kind of data structure, string, list, array, XML, or database query results, they all share a rich set of operations: iteration, slicing, sorting, and splicing.

Built in sequence type overview

The standard library provides a rich selection of sequence types implemented in C language:

Container sequence
Refers to list, tuple, collections Deque can store different types of data, including embedded containers.
Flat sequences
Refers to STR, bytes, byte array, memoryview, array Array can only save the same type of data.

The container sequence stores the reference of the object, which can be of any type; Flat sequences store values, not references. As shown in Figure 2-1:


In the figure above, the container sequence is on the left and the flat sequence is on the right.
have... The gray grid represents the memory address header of each object.
On the left, for example, a reference array in which tuples have their contents. Each element is a separate Python object and may be a reference to other Python objects.
On the right, the array in Python is a separate object, holding three C language double type arrays.

Therefore, flat types are more compact, but are limited to storing original types, such as byte,integer,float, etc.

Another way to group sequence types is based on their variability:

Variable sequence
list, bytearray, array.array, collections.deque, memoryview
Immutable sequence
tuple, str, bytes

Figure 2-2 shows that the variable sequence inherits all methods from the non variable sequence, and implements several additional methods.

Although the built-in Sequence types are not directly inherited from the two abstract base classes (ABC) of Sequence and MutableSequence, they are virtual subclasses registered on these ABC. As virtual subclasses, tuple and list can pass the following tests:

from collections import abc
print(issubclass(tuple,abc.Sequence)) # True
print(issubclass(list,abc.MutableSequence)) # True

The most basic and commonly used sequence type is list. You should know its basic use. The list derivation is introduced directly below.

List derivation and generator expressions

A quick way to build lists is to use list derivation or generator expressions (you can create any type of sequence).

The list derivation is more readable and faster than the traditional for loop.

List derivation and readability

Example 1: change the string into Unicode code point

symbols = '$¢£¥€¤'
codes = []
for symbol in symbols:
    codes.append(ord(symbol))

codes # [36, 162, 163, 165, 8364, 164]

Example 2: another way to change a string into a Unicode code point

symbols = '$¢£¥€¤'
codes = [ord(symbol) for symbol in symbols]
codes # [36, 162, 163, 165, 8364, 164]

Equivalent to list derivation, reduce 3 lines of code to 1 line. If you understand the list derivation, you will find that its writing is more readable.

However, when writing list derivation, ensure that the code is concise. If it is more than two lines, it may need to be rewritten into a for loop.

List derivation will no longer have the problem of variable leakage

List derivation and generator expressions have their own local scopes, just like functions. Variables and assignments inside the expression only work locally, while variables with the same name in the context of the expression can also be referenced normally, and local variables will not affect them.

x = 'CBA'
codes = [ord(x) for x in x]
print(x) # 'CBA'
print(codes) # [67, 66, 65]
  • Print the description in line 3, x or reference to 'CBA'
  • List derivation also produces the desired list

However, it is not recommended to use variables with the same name in this way, which is easy to be confused and does not need to be saved.

List derivation builds lists from sequences or any other iteratable type, while filtering and transforming elements at build time. filter and map can do the same thing together, but the readability is not so good. As shown below.

List derivation vs filter and map

List derivation can do everything that filter and map can do, and will not be limited by the change of lambda expression function.

symbols = '$¢£¥€¤'
beyond_ascii = [ord(s) for s in symbols if ord(s) > 127]
print(beyond_ascii) # [162, 163, 165, 8364, 164]
beyond_ascii = list(filter(lambda c: c > 127, map(ord , symbols)))
print(beyond_ascii) # [162, 163, 165, 8364, 164]

Now let's look at how to use list derivation to calculate Cartesian product: the pairs of elements in two or more lists form tuples, and the list formed by these tuples is Cartesian product.

Cartesian product

The Cartesian product of two or more iteratable types can be generated by list derivation. Cartesian product is a list. The elements in the list are tuples composed of input iteratable element pairs. Therefore, the length of Cartesian product is equal to the product of the length of input variables, as shown in Figure 2-3:


For example, suppose you need a list. In the list are three T-shirt s of different sizes, and each size has two colors. The following code shows how to generate such a list through list derivation. The return result is 2 × 3 = 6 2 \times 3=6 two × 3 = 6 elements:

colors = ['black', 'white']
sizes = ['S', 'M', 'L']
# Generate a list of tuples, first according to color, and then according to size
tshirts = [(color, size) for color in colors for size in sizes]
tshirts


Let's look at the same for loop writing:

for color in colors:
    for size in sizes:
        print((color, size))

You can also use list derivation to generate from size first and then color:

tshirts = [(color, size) for size in sizes for color in colors]
tshirts


List derivation has only one function: generating lists. Generator representations come in handy if you want to generate other sequence types.

Generator Expressions

Although list derivation can also be used to initialize tuples, arrays, or other sequence types, generator expressions are a better choice. This is because the iterator protocol is followed behind the generator expression, which can generate elements one by one, which can obviously save memory.

The syntax of generator expressions is similar to list derivation, except that square brackets are replaced by parentheses.
The following shows how to build tuples and arrays with generator expressions.

symbols = '$¢£¥€¤'
print( tuple(ord(symbol) for symbol in symbols) ) # (36, 162, 163, 165, 8364, 164)
import array
print( array.array('I',(ord(symbol) for symbol in symbols)) )  # array('I', [36, 162, 163, 165, 8364, 164])


The following code shows how to implement a Cartesian product using a generator expression to print all combinations of two colors and three sizes of T-shirts above. Different from the above code, after using the generator expression, a list of six combinations will not be left in memory, because the vocalizer expression will generate a combination every time the for loop runs.

colors = ['black', 'wihte']
size = ['S', 'M', 'L']
# The generator expression generates elements one by one, and will not produce a list with six tshrit styles at one time
for tshirt in ('%s %s' % (c ,s) for c in colors for s in sizes) :
    print(tshirt)

Output:

black S
black M
black L
wihte S
wihte M
wihte L

Let's look at another important sequence type: tuple

Tuples are not just immutable lists

Keywords: Python

Added by james2010 on Sat, 05 Mar 2022 06:09:27 +0200