Numpy Quick Start Guide (Advanced)

1. Broadcasting rules

  • The broadcast rule enables general-purpose functions to deal meaningfully with inputs that do not have the same shape.
    • The first rule of Broadcasting: if all input arrays have different dimensions, a "1" will be repeatedly added to the array with smaller dimensions until all arrays have the same dimension.
    • The second law of Broadcasting: determine that an array with length 1 behaves in a particular direction as if it had the size of the largest shape in that direction. For arrays, the values of array elements along that dimension should be the same.

2. Fancy indexing and indexing skills

  • Numpy provides more indexing capabilities than ordinary Python sequences.
  • In addition to indexing integers and slices, as we saw earlier, arrays can be indexed by integer arrays and Boolean arrays.

a. By array index

from numpy import * 
a = arange(12)**2                          
print(a)

Output results

[  0   1   4   9  16  25  36  49  64  81 100 121]
i = array( [ 1,1,3,8,5 ] )                 
a[i]   

Output results

array([ 1,  1,  9, 64, 25], dtype=int32)
j = array( [ [ 3, 4], [ 9, 7 ] ] )         
a[j]                                       

Output results

array([[ 9, 16],
       [81, 49]], dtype=int32)

When the indexed array a is multidimensional, each unique index sequence points to the first dimension of a.

The following example illustrates this behavior by converting a picture label with a palette to a color image.

palette = array( [ [0,0,0],                # black
                   [255,0,0],              # gules
                   [0,255,0],              # green
                   [0,0,255],              # blue
                   [255,255,255] ] )       # white
image = array( [ [ 0, 1, 2, 0 ],           # each value corresponds to a color in the palette
                 [ 0, 3, 4, 0 ]  ] )
palette[image]                            # the (2,4,3) color image

Output results

array([[[  0,   0,   0],
        [255,   0,   0],
        [  0, 255,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0, 255],
        [255, 255, 255],
        [  0,   0,   0]]])

We can also give more than one-dimensional indexes, and each one-dimensional index array must have the same shape.

a = arange(12).reshape(3,4)
a

Output results

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
i = array( [ [0,1],                        # indices for the first dim of a
             [1,2] ] )
j = array( [ [2,1],                        # indices for the second dim
             [3,3] ] )
a[i,j] 

Output results

array([[ 2,  5],
       [ 7, 11]])
a[i,2]

Output results

array([[ 2,  6],
       [ 6, 10]])
a[:,j]  

Output results

array([[[ 2,  1],
        [ 3,  3]],

       [[ 6,  5],
        [ 7,  7]],

       [[10,  9],
        [11, 11]]])

Naturally, we can put i and j in a sequence (such as a list) and index them through a list.

l = [i,j]
a[l]                                       # Equal to a[i,j]
  • Output results
array([[ 2,  5],
       [ 7, 11]])

However, we cannot put i and j in an array because this array will be interpreted as the first dimension of index a.

Another common array index usage is to search for the maximum value of the time series.

time = linspace(20, 145, 5)                 # time scale
data = sin(arange(20)).reshape(5,4)         # 4 time-dependent series
print(time)
print("-------")
print(data)

Output results

[ 20.    51.25  82.5  113.75 145.  ]
-------
[[ 0.          0.84147098  0.90929743  0.14112001]
 [-0.7568025  -0.95892427 -0.2794155   0.6569866 ]
 [ 0.98935825  0.41211849 -0.54402111 -0.99999021]
 [-0.53657292  0.42016704  0.99060736  0.65028784]
 [-0.28790332 -0.96139749 -0.75098725  0.14987721]]
#Index of the maximum value of each row
ind = data.argmax(axis=1)                   
print(ind)

#Index of the maximum value of each column
ind = data.argmax(axis=0)                   
print(ind)

Output results

[2 3 0 2 3]
[2 0 3 1]
time_max = time[ind]                       # times corresponding to the maxima
data_max = data[ind, range(data.shape[1])] # => data[ind[0],0], data[ind[1],1]...
print(time_max)
print("------")
print(range(data.shape[1]))
print(data_max)

Output results

[ 82.5   20.   113.75  51.25]
------
range(0, 4)
[0.98935825 0.84147098 0.99060736 0.6569866 ]
#Maximum value of array column data Max (axis = 0) and data_ The data obtained by Max is the same

print(data_max)
print("-------")
print(data.max(axis=0))
print("-------")
all(data_max == data.max(axis=0))   # True

Output results

[0.98935825 0.84147098 0.99060736 0.6569866 ]
-------
[0.98935825 0.84147098 0.99060736 0.6569866 ]
-------
True

You can also use the array index as the target to assign values:

a = arange(5)
a 

Output results

array([0, 1, 2, 3, 4])
a[[1,3,4]] = 0
a 

Output results

array([0, 0, 2, 0, 0])

However, when an index list contains duplicates, the assignment is completed multiple times and the last value is retained:

a = arange(5)
a[[0,0,2]]=[1,2,3]
a

Output results

array([2, 1, 3, 3, 4])

This is reasonable enough, but be careful. If you want to use Python's + = structure, the result may not be what you expect:

a = arange(5)
a[[0,0,2]]+=1
a 
  • Output results
array([1, 1, 3, 3, 4])

Even if 0 appears twice in the index list, the element with index 0 is added only once. This is because Python requires a+=1 to be equivalent to a=a+1.

b. Index by Boolean array

  • When we use integer arrays to index arrays, we provide an index list to select.
  • The method of indexing through Boolean arrays is different. We explicitly select the elements we want and don't want in the array.
  • The most natural way we can think of using Boolean array index is to use Boolean array with the same shape as the original array.
a = arange(12).reshape(3,4)
b = a > 4
print(a)
print("-------")
print(b)     # b is a boolean with a's shape

Output results

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
-------
[[False False False False]
 [False  True  True  True]
 [ True  True  True  True]]
a[b]    # 1d array with the selected elements

Output results

array([ 5,  6,  7,  8,  9, 10, 11])

This property is very useful in assignment:

a[b] = 0    # All elements of 'a' higher than 4 become 0
a

Output results

array([[0, 1, 2, 3],
       [4, 0, 0, 0],
       [0, 0, 0, 0]])

You can refer to the mandelberg set example to see how to use Boolean index to generate the image of mandelberg set.
The second method of indexing by Boolean is more similar to integer indexing; For each dimension of the array, we give a one-dimensional Boolean array to select the slice we want.

#Select the rows of the array

a = arange(12).reshape(3,4)
b1 = array([False,True,True])             # first dim selection

print(a)
print("-------")
print(a[b1,:])                                 # selecting rows
print("-------")
print(a[b1] )

Output results

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
-------
[[ 4  5  6  7]
 [ 8  9 10 11]]
-------
[[ 4  5  6  7]
 [ 8  9 10 11]]
#Select the columns of the array

a = arange(12).reshape(3,4)

b2 = array([True,False,True,False])       # second dim selection

print(a)
print("-------")
a[:,b2]                                  # selecting columns

Output results

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
-------
array([[ 0,  2],
       [ 4,  6],
       [ 8, 10]])
#Select both rows and columns

a[b1,b2] 

Output results

array([ 4, 10])

Note that the length of the one-dimensional array must be consistent with the length of the dimension or axis you want to slice. In the previous example, b1 is an array with rank 1 and length 3 (the number of rows of a), and B2 (length 4) is consistent with the second rank (column) of a.

c. Pass ix_ () function

ix_ Functions can be used to combine different vectors in order to obtain the results of multivariate groups.
For example, if you want to calculate a+b*c with a triple of all the elements of vectors a, B and c:

a = array([2,3,4,5])
b = array([8,5,4])
c = array([5,4,6,8,3])
ax,bx,cx = ix_(a,b,c)

print(ax,ax.shape)
print("-------------")
print(bx,bx.shape)
print("-------------")
print(cx,cx.shape)
print("-------------")

Output results

[[[2]]

 [[3]]

 [[4]]

 [[5]]] (4, 1, 1)
-------------
[[[8]
  [5]
  [4]]] (1, 3, 1)
-------------
[[[5 4 6 8 3]]] (1, 1, 5)
-------------
result = ax+bx*cx
result

Output results

array([[[42, 34, 50, 66, 26],
        [27, 22, 32, 42, 17],
        [22, 18, 26, 34, 14]],

       [[43, 35, 51, 67, 27],
        [28, 23, 33, 43, 18],
        [23, 19, 27, 35, 15]],

       [[44, 36, 52, 68, 28],
        [29, 24, 34, 44, 19],
        [24, 20, 28, 36, 16]],

       [[45, 37, 53, 69, 29],
        [30, 25, 35, 45, 20],
        [25, 21, 29, 37, 17]]])
result[3,2,4] 

Output results

17
a[3]+b[2]*c[4] 

Output results

17

You can also simplify as follows:

def ufunc_reduce(ufct, *vectors):
    vs = ix_(*vectors)
    r = ufct.identity
    for v in vs:
        r = ufct(r,v)
    return r

Then use it this way:

ufunc_reduce(add,a,b,c)

Output results

array([[[15, 14, 16, 18, 13],
        [12, 11, 13, 15, 10],
        [11, 10, 12, 14,  9]],

       [[16, 15, 17, 19, 14],
        [13, 12, 14, 16, 11],
        [12, 11, 13, 15, 10]],

       [[17, 16, 18, 20, 15],
        [14, 13, 15, 17, 12],
        [13, 12, 14, 16, 11]],

       [[18, 17, 19, 21, 16],
        [15, 14, 16, 18, 13],
        [14, 13, 15, 17, 12]]])

This reduce and UFUNC The advantage of reduce (for example, add.reduce) is that it uses the broadcast law to avoid creating a parameter array whose output size is multiplied by the number of vectors.

d. Index with string

  • Reference documents: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.rec.html

3. Linear algebra

Simple array operation

from numpy import *
from numpy.linalg import *
a = array([[1.0, 2.0], [3.0, 4.0]])
print (a)

Output results

[[1. 2.]
 [3. 4.]]
a.transpose()

Output results

array([[1., 3.],
       [2., 4.]])
inv(a)

Output results

array([[-2. ,  1. ],
       [ 1.5, -0.5]])
#Generate a 2 * 2 matrix with a diagonal of 1
u = eye(2) 
u

Output results

array([[1., 0.],
       [0., 1.]])
#matrix product 
j = array([[0.0, -1.0], [1.0, 0.0]])
dot (j, j) 

Output results

array([[-1.,  0.],
       [ 0., -1.]])
#trace
trace(u) 

Output results

2.0
eig(j)
# Parameters:
#     square matrix
# Returns
#     The eigenvalues, each repeated according to its multiplicity.
#     The normalized (unit "length") eigenvectors, such that the
#     column ``v[:,i]`` is the eigenvector corresponding to the
#     eigenvalue ``w[i]``

Output results

(array([0.+1.j, 0.-1.j]),
 array([[0.70710678+0.j        , 0.70710678-0.j        ],
        [0.        -0.70710678j, 0.        +0.70710678j]]))

Matrix class

This is a brief introduction to matrix classes.

A = matrix('1.0 2.0; 3.0 4.0')
A

Output results

matrix([[1., 2.],
        [3., 4.]])
#Matrix: matrix

type(A) 

Output results

numpy.matrix
#Transpose: transpose

A.T  

Output results

matrix([[1., 3.],
        [2., 4.]])
X = matrix('5.0 7.0')
Y = X.T
Y

Output results

matrix([[5.],
        [7.]])
# matrix multiplication

print(A*Y)  

Output results

[[19.]
 [43.]]
# Inverse of matrix: inverse

print(A)
print("-----------")
print(A.I)

Output results

[[1. 2.]
 [3. 4.]]
-----------
[[-2.   1. ]
 [ 1.5 -0.5]]
#solving linear equation

solve(A, Y)  

Output results

matrix([[-3.],
        [ 4.]])

Index: comparing matrices and two-dimensional arrays

  • Note that there are some important differences between arrays and matrices in Numpy.
  • Numpy provides two basic objects: an N-dimensional array object and a general function object. Other objects are built on them.
  • In particular, the matrix is a two-dimensional array object inherited from the Numpy array object. For arrays and matrices, the index must contain one or more of these combinations: integer scalars, ellipses, integer lists; Boolean, a tuple of integers or boolean values, and a one-dimensional array of integers or boolean values.
  • A matrix can be used as an index to a matrix, but it usually requires an array, list, or other form to accomplish this task.
  • As usual in Python, the index starts at 0. Traditionally, we use rectangular rows and columns to represent a two-dimensional array or matrix, in which rows passing along the 0 axis and columns passing along the 1 axis are called rows.

Let's create arrays and matrices to slice:

A = arange(12)
A

Output results

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
A.shape = (3,4)
M = mat(A.copy())

print (A,type(A))
print("------")
print (M,type(M))

Output results

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]] <class 'numpy.ndarray'>
------
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]] <class 'numpy.matrix'>

Now, let's simply slice.
Basic slicing uses slicing objects or integers.
For example, the evaluation of A [:] and M [:] will behave very similar to Python indexes. However, it is important to note that Numpy slice arrays do not create copies of data; Slices provide A unified view of data.

print (A[:])
print (A[:].shape)

Output results

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
(3, 4)
print (M[:])
print (M[:].shape)

Output results

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
(3, 4)

Now there are some differences from Python indexes: you can use comma separated indexes to index along multiple axes at the same time.

print (A[:,1])
print (A[:,1].shape)

Output results

[1 5 9]
(3,)
print (M[:,1]); 
print (M[:,1].shape)

Output results

[[1]
 [5]
 [9]]
(3, 1)

Note the difference between the last two results. Using a colon for a two-dimensional array produces a one-dimensional array, but a matrix produces a two-dimensional matrix.
For example, an M[2,:] slice produces a matrix with a shape of (1,4). In contrast, an array slice always produces an array with the lowest possible dimension.
For example, if C is a three-dimensional array, C [..., 1] produces a two-dimensional array and C[1,:,1] produces a one-dimensional array. From this point on, if the corresponding matrix slice results are the same, we will only show the results of array slices.

If we want the first and third columns of an array, one way is to use list slicing:

A[:,[1,3]]

Output results

array([[ 1,  3],
       [ 5,  7],
       [ 9, 11]])

The slightly more complicated method is to use the take() method:

A[:,].take([1,3],axis=1)

Output results

array([[ 1,  3],
       [ 5,  7],
       [ 9, 11]])

If we want to skip the first line, we can do this:

A[1:,].take([1,3],axis=1)

Output results

array([[ 5,  7],
       [ 9, 11]])

Or we just use A[1:,[1,3]]. Another way is through the matrix vector product (cross product).

A[ix_((1,2),(1,3))]

Output results

array([[ 5,  7],
       [ 9, 11]])

Now let's do something more complicated. For example, we want to keep the first row with a column greater than 1. One way is to create a Boolean index:

A[0,:]>1

Output results

array([False, False,  True,  True])
A[:,A[0,:]>1]

Output results

array([[ 2,  3],
       [ 6,  7],
       [10, 11]])

That's what we want! But the index matrix is not so convenient.

M[0,:]>1

Output results

matrix([[False, False,  True,  True]])

The problem with this process is to slice A matrix with "matrix slice", but the matrix has A convenient A attribute, and its value is presented by an array. So we just do the following alternatives:

M[:,M.A[0,:]>1]

Output results

matrix([[ 2,  3],
        [ 6,  7],
        [10, 11]])

If we want to slice conditionally in both directions of the matrix, we must adjust the strategy slightly and replace it with:

A[A[:,0]>2,A[0,:]>1]

Output results

array([ 6, 11])
M[M.A[:,0]>2,M.A[0,:]>1]

Output results

matrix([[ 6, 11]])

We need to use the vector product ix#:

A[ix_(A[:,0]>2,A[0,:]>1)]

Output results

array([[ 6,  7],
       [10, 11]])
M[ix_(M.A[:,0]>2,M.A[0,:]>1)]

Output results

matrix([[ 6,  7],
        [10, 11]])

4. Tips and hints

Change shape automatically

  • By changing the dimension of the array, you can omit a dimension, which will be derived automatically.
a = arange(30)
a.shape = 2,-1,3  # -1 means "whatever is needed"
print(a.shape,"\n-------\n",a)

Output results

(2, 5, 3) 
-------
 [[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]
  [ 9 10 11]
  [12 13 14]]

 [[15 16 17]
  [18 19 20]
  [21 22 23]
  [24 25 26]
  [27 28 29]]]

Vector stacking

  • How do we build a two-dimensional array with two lists of row vectors of the same size?
  • In MATLAB, this is very simple: if x and y are two vectors of the same length, you just need to do m=[x;y]. In Numpy, this process passes through the function column_stack, dstack, hstack and vstack, depending on which dimension you want to combine.

For example:

x = arange(0,10,2)
x 

Output results

array([0, 2, 4, 6, 8])
y = arange(5)  
y

Output results

array([0, 1, 2, 3, 4])
m = vstack([x,y])           
m   

Output results

array([[0, 2, 4, 6, 8],
       [0, 1, 2, 3, 4]])
xy = hstack([x,y])     
xy

Output results

array([0, 2, 4, 6, 8, 0, 1, 2, 3, 4])

Keywords: Python

Added by pagegen on Mon, 20 Dec 2021 06:11:59 +0200