numpy learning notes of python 01

WeChat official account: click on blue font, small white image and visual attention.

About technology, focus on yyfilence00. If there are any questions or suggestions, please official account message.

  • Organize knowledge and learn notes
  • Publish journal, essay, what you see and what you think

1: About numpy

NumPy is a powerful Python library, which is mainly used to perform calculations on multidimensional arrays. The word NumPy comes from two words -- Numerical and python. NumPy provides a large number of library functions and operations, which can help programmers easily carry out Numerical calculation

#Introducing numpy Library
import numpy

So you can use all the built-in methods in numpy, such as sum and mean.
numpy.max()
numpy.min()

import numpy as np

my_arr = np.arange(1000000)
my_list = list(range(1000000))
# Take a look at the calculation efficiency comparison between "numpy array" and "list": both are 1000000 in size, double each element, and use% time for 10 times of running.
%time for _ in range(10): my_arr2 = my_arr * 2

%time for _ in range(10): my_list2 = [x * 2 for x in my_list]
Wall time: 49 ms
Wall time: 1.68 s

1.1 creating array

#np.array() method creation
arr = [1, 4, 3.2, 5]
np.array(arr)
array([1. , 4. , 3.2, 5. ])

Note that the output of the numpy array is marked with the word array(), and the elements inside are surrounded by "brackets []".

Wei np.arange() and np.linspace() method creation
 #Fixed element size interval
 #linspace of fixed point: fixed number of elements
arr_ = np.arange(10)
arr_
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print(arr_)
[0 1 2 3 4 5 6 7 8 9]

Note: if you print the numpy array with the function print, there will be no array(), only its contents, and there will be no comma between elements.

print( np.arange(10) )
print( np.arange(2,10) )
print( np.arange(2,11,2))
[0 1 2 3 4 5 6 7 8 9]
[2 3 4 5 6 7 8 9]
[ 2  4  6  8 10]

Function prototype:

arange(start , stop , step)

Where stop must have. If start and step do not have it, the default value is 1
arr2 = np.linspace(2,8,3)
arr2
array([2., 5., 8.])
print(arr2)
[2. 5. 8.]

Note: if you print the numpy array with the function print, there will be no array(), only its contents, and there will be no comma between elements.

Function prototype:

alinspace (start , stop , num))

Where start and stop must have and num do not have, the default value is 50

The calculation method of step length between (stop - start) / (num - 1)

NumPy creates additional properties of arrays

  • Create an n-dimensional array with zeros()
  • Create an all 1 n-dimensional array with ones()
  • Using random() to create random n-dimensional array
  • Create diagonal matrix (2D array) with eye()

For the first three types, since the output is n-array, their parameters are a "scalar" or "tuple type shape". The following three examples are easy to understand:

print( np.zeros(6) ) # Scalar 5 represents shape (5,)
print( np.ones((2,5)) )
print( np.random.random((2,3,4)) )
[0. 0. 0. 0. 0. 0.]
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
[[[0.96070029 0.13700133 0.86750795 0.60046375]
  [0.51925163 0.25111592 0.92045058 0.92544552]
  [0.31744394 0.46865624 0.6622886  0.77742483]]

 [[0.94793251 0.27383846 0.99977759 0.45765543]
  [0.70677173 0.07758811 0.78253172 0.46340084]
  [0.25709134 0.32701264 0.29887367 0.11625852]]]
#For the function eye(), its parameter is a scalar, which controls the number of rows or columns of the matrix:
np.eye(4)
array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In addition, you can set the parameter k in eye()

  • The default setting k = 0 means 1 falls on the diagonal
  • k = 1 means 1 falls on the top right of the diagonal
  • k = -1 means 1 falls on the bottom left of the diagonal
np.eye(4, k=1)
array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

1.2 array property

Let's see what attributes and methods an array has.

One dimensional array

arr3 = np.array([1, 2.1, 5, 4.3])
arr3
array([1. , 2.1, 5. , 4.3])
dir(arr3) 
['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_function__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmatmul__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__setitem__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__xor__',
 'all',
 'any',
 'argmax',
 'argmin',
 'argpartition',
 'argsort',
 'astype',
 'base',
 'byteswap',
 'choose',
 'clip',
 'compress',
 'conj',
 'conjugate',
 'copy',
 'ctypes',
 'cumprod',
 'cumsum',
 'data',
 'diagonal',
 'dot',
 'dtype',
 'dump',
 'dumps',
 'fill',
 'flags',
 'flat',
 'flatten',
 'getfield',
 'imag',
 'item',
 'itemset',
 'itemsize',
 'max',
 'mean',
 'min',
 'nbytes',
 'ndim',
 'newbyteorder',
 'nonzero',
 'partition',
 'prod',
 'ptp',
 'put',
 'ravel',
 'real',
 'repeat',
 'reshape',
 'resize',
 'round',
 'searchsorted',
 'setfield',
 'setflags',
 'shape',
 'size',
 'sort',
 'squeeze',
 'std',
 'strides',
 'sum',
 'swapaxes',
 'take',
 'tobytes',
 'tofile',
 'tolist',
 'tostring',
 'trace',
 'transpose',
 'var',
 'view']
print( 'The type is', type(arr3) )
print( 'The dimension is', arr3.ndim )
print( 'The length of array is', len(arr3) )
print( 'The number of elements is', arr3.size )
print( 'The shape of array is', arr3.shape )
print( 'The strides of array is', arr3.strides )
print( 'The type of elements is', arr3.dtype )
The type is <class 'numpy.ndarray'>
The dimension is 1
The length of array is 4
The number of elements is 4
The shape of array is (4,)
The strides of array is (8,)
The type of elements is float64

According to the results, let's see what the above attributes are:

  • Type: array type, of course numpy.ndarray
  • ndim: the number of dimensions is 1
  • len(): the array length is 5 (note that this statement is only meaningful for one-dimensional arrays)
  • size: the number of array elements is 5
  • Shape: array shape, that is, the number of elements in each dimension (represented by tuples), only one dimension, the number of elements is 5, written as tuple form is (5,)
  • strides: span, that is, the number of bytes (represented by tuples) that need to be "spanned" in order to get the next element under a certain dimension. float64 is 8 bytes, so the span is 8
  • dtype: array element type, is double precision floating-point (note and type distinction)

Function prototype:

numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)

Two dimensional array

arr02 = [[1,2,3], [4,5,6]]
arr2d = np.array(arr02)
arr2d
array([[1, 2, 3],
       [4, 5, 6]])
dir(arr2d)
['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_function__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmatmul__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__setitem__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__xor__',
 'all',
 'any',
 'argmax',
 'argmin',
 'argpartition',
 'argsort',
 'astype',
 'base',
 'byteswap',
 'choose',
 'clip',
 'compress',
 'conj',
 'conjugate',
 'copy',
 'ctypes',
 'cumprod',
 'cumsum',
 'data',
 'diagonal',
 'dot',
 'dtype',
 'dump',
 'dumps',
 'fill',
 'flags',
 'flat',
 'flatten',
 'getfield',
 'imag',
 'item',
 'itemset',
 'itemsize',
 'max',
 'mean',
 'min',
 'nbytes',
 'ndim',
 'newbyteorder',
 'nonzero',
 'partition',
 'prod',
 'ptp',
 'put',
 'ravel',
 'real',
 'repeat',
 'reshape',
 'resize',
 'round',
 'searchsorted',
 'setfield',
 'setflags',
 'shape',
 'size',
 'sort',
 'squeeze',
 'std',
 'strides',
 'sum',
 'swapaxes',
 'take',
 'tobytes',
 'tofile',
 'tolist',
 'tostring',
 'trace',
 'transpose',
 'var',
 'view']
# Take a look at the shuttle print attributes:
print( 'The type is', type(arr2d) )
print( 'The dimension is', arr2d.ndim )
print( 'The length of array is', len(arr2d) )
print( 'The number of elements is', arr2d.size )
print( 'The shape of array is', arr2d.shape )
print( 'The stride of array is', arr2d.strides )
print( 'The type of elements is', arr2d.dtype )
The type is <class 'numpy.ndarray'>
The dimension is 2
The length of array is 2
The number of elements is 6
The shape of array is (2, 3)
The stride of array is (12, 4)
The type of elements is int32

Similarly, let's analyze the above attributes:

  • Type: array type numpy.ndarray
  • ndim: the number of dimensions is 2
  • len(): array length is 2 (strictly defined len is the number of elements in "axis 0")
  • size: the number of array elements is 6
  • Shape: array shape, the number of elements in each dimension, expressed in tuples (2, 3)
  • strides: span (12, 4) to be explained after reading the figure below
  • dtype: array element type int32

In the numpy array, the default is row major order, which means that the elements of each row are adjacent to each other in the memory block, while column major order is that the elements of each column are adjacent to each other in the memory block.

Review the definition of span, that is, the number of bytes that need to be "spanned" under a dimension in order to get to the next element. Note: each int32 element is 4 bytes.

  • First dimension (axis 0): three elements need to be crossed to get the next element along it, that is, 12 = 3 × 4 bytes
  • Second dimension (axis 1): to get the next element along it, you need to step over one element, that is, 4 = 1 × 4 bytes

Therefore, the span of the two-dimensional array is (12, 4).

n-dimensional array

With np.random.random() to generate a multidimensional array:

arr4d = np.random.random( (2,2,2,3) )
arr4d
array([[[[0.50169052, 0.95593211, 0.01650688],
         [0.66450554, 0.89081254, 0.49317664]],

        [[0.94852445, 0.35366822, 0.67516144],
         [0.902386  , 0.40697895, 0.69716439]]],


       [[[0.31261512, 0.67417529, 0.22901764],
         [0.61775389, 0.89892861, 0.97588106]],

        [[0.07474652, 0.35708577, 0.5014601 ],
         [0.59584862, 0.91910878, 0.40630768]]]])
print( 'The type is', type(arr4d) )
print( 'The dimension is', arr4d.ndim )
print( 'The length of array is', len(arr4d) )
print( 'The number of elements is', arr4d.size )
print( 'The shape of array is', arr4d.shape )
print( 'The stride of array is', arr4d.strides )
print( 'The type of elements is', arr4d.dtype )
The type is <class 'numpy.ndarray'>
The dimension is 4
The length of array is 2
The number of elements is 24
The shape of array is (2, 2, 2, 3)
The stride of array is (96, 48, 24, 8)
The type of elements is float64

Leave a thought question. What's the relationship between stripes and shape?

strides = (96, 48, 24, 8)

shape = (2, 2, 2, 3)

A: because the definition of span is the number of bytes that need to be "spanned" under a certain dimension in order to get the next element. Each float64 element is 8 bytes.

So each element of the fourth dimension (axis 3) is adjacent, so the span is 8, that is, strip (3) = 8. and so on:

The span of the third dimension (axis 2) is strip (2) = shape (3) strip (3) = 38 = 24.

The span of the second dimension (axis 1) is strip (1) = shape (2) strip (2) = 224 = 48.

The span of the first dimension (axis 0) is strip (0) = shape (1) strip (1) = 248 = 96.

1.3 array storage

I know that the "save" and "load" of arrays are simple, but they are very important. Suppose you have trained a deep neural network, which is represented by numerous parameters. For example, the weights are all numpy arrays. It is very important to save them in. npy format or. csv format so that they can be reused for the next training

The. npy format of numpy itself

With np.save Function to save the numpy array in. npy format, as follows:

#coding=utf-8
import numpy as np
arr_disk = np.arange(10)
np.save("arr_disk", arr_disk)
arr_disk
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.load("arr_disk.npy")
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Text. txt format

With np.savetxt Function to save the numpy array in. txt format, as follows:

arr_text = np.array([[1., 2., 3.], [4., 5., 6.]])
np.savetxt("arr_from_text.txt", arr_text)
np.loadtxt("arr_from_text.txt")
array([[1., 2., 3.],
       [4., 5., 6.]])

Text. csv format

Let's say we're already in arr_ from_ The csv file of csv contains [[1,2,3], [4,5,6]]. The elements of each line are separated by a semicolon; as shown below:

np.genfromtxt("arr_from_csv.csv", delimiter=";")

#coding=utf-8
import csv
#Open a csv file in read mode
csvfile = open('I://arr_from_csv.csv','r')
#Define a variable to read
readCSV = csv.reader(csvfile)
print(readCSV)
<_csv.reader object at 0x000000000F26FDB0>

1.3 array getting

Getting arrays is done by indexing and slicing,

  • An index is an element that gets a specific location

  • A slice is an element that gets a specific location

Indexing and slicing as like as two peas.

  • The index method is arr[index]
  • The slicing method is arr [start: Stop: step]

Therefore, slicing operations can be implemented by index operations (one can always be pieced together into one segment), but it is unnecessary.

There are three types of index array: normal index, Boolean index and fancy index.

Regular index

  • The index results in a copy of the original array. Modifying the contents of the index will not change the original array

  • Slicing results in a view of the original array. Modifying the contents of the slicing will change the original array

Indexes

# One dimensional array index
arr = np.arange(10)
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
#Index the fourth element with arr[3] (remember that Python records the location from 0)
arr[3]
3
# Assign it to variable a, and re assign 100 to a, but the value of the seventh element of the array arr is still 6, not 100.
a = arr[3]
a = 100
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Slice elements 6 to 8 with arr[5:8] (remember that Python slices do not include the header and the tail)
arr[5:8]
array([5, 6, 7])
#Assign it to variable b, and re assign 18 to the second element of b, and then see that the value of the seventh element of array arr has become 18.
b = arr[5:8]
b[1] = 18
arr
array([ 0,  1,  2,  3,  4,  5, 18,  7,  8,  9])

This proves that the slicing gets the view of the original array, changing the slicing data will change the original array, while the index gets the copy of the original array, and changing the index data will not change the original array.

# 2D array index
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
# Index the third row with arr2d[2]. More strictly, index the third element on axis 0.
arr2d[2]
array([7, 8, 9])
# Index the first row and the third column with arr2d[0][2]
arr2d[0][2]
3

Index two-dimensional array with two brackets is troublesome. Index five dimensional array with five brackets? There is also an easy way to index the first row and the third column with arr2d[0, 2]

arr2d[0,2]
3

section

# Use arr2d[:2] to slice the first two lines. More strictly, index the first two elements on "axis 0".
arr2d[:2] 
array([[1, 2, 3],
       [4, 5, 6]])
# Slice the first and third columns with arr2d[:, [0,2]]
arr2d[:,[0,2]] 
array([[1, 3],
       [4, 6],
       [7, 9]])
# Slice the first two elements of the second line with arr2d [1,: 2]
arr2d[1, :2]
array([4, 5])
# Slice the first two elements of the third column with arr2d[:2, 2]
arr2d[:2, 2]
array([3, 6])

Boolean index

Boolean index is an array of boolean type values to select elements.

Suppose we have Alibaba, Facebook and JD

  • Stock code array
  • Price array of stock price: each line records the opening, highest and closing price of a day.
code = np.array(['BABA', 'FB', 'JD', 'BABA', 'JD', 'FB'])
price = np.array([[170,177,169],[150,159,153],
                  [24,27,26],[165,170,167],
                  [22,23,20],[155,116,157]])
price
array([[170, 177, 169],
       [150, 159, 153],
       [ 24,  27,  26],
       [165, 170, 167],
       [ 22,  23,  20],
       [155, 116, 157]])

Suppose we want to find the corresponding share price of BABA, first find the index (Boolean index) corresponding to "BABA" in the code, that is, a Boolean array with values of True and False.

code == 'BABA'
array([ True, False, False,  True, False, False])
# Using this index, we can get the share price of BABA:
price[ code == 'BABA' ]
array([[170, 177, 169],
       [165, 170, 167]])
# Try again to get the share price of JD and FB:
price[ (code == 'FB')|(code == 'JD') ]
array([[150, 159, 153],
       [ 24,  27,  26],
       [ 22,  23,  20],
       [155, 116, 157]])
# Although the following operation has no practical significance, try to clear the stock price less than 25.

price[ price < 25 ] = 0
price
array([[170, 177, 169],
       [150, 159, 153],
       [  0,  27,  26],
       [165, 170, 167],
       [  0,   0,   0],
       [155, 116, 157]])

Fancy index

A fancy index is an efficient way to get the specific elements you want in an array. Consider the following array:

arr = np.arange(32).reshape(8,4)
arr
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

Suppose you want to get lines 5, 4, and 7 in a specific order, use arr [[4, 3, 6]]

arr[ [4,3,6] ]
array([[16, 17, 18, 19],
       [12, 13, 14, 15],
       [24, 25, 26, 27]])

Suppose you want to get the last 4, 3 and 6 lines in a specific order (i.e. positive 4, 5 and 2 lines), use arr [[- 4, - 3, - 6]]

arr[ [-4,-3,-6] ]
array([[16, 17, 18, 19],
       [20, 21, 22, 23],
       [ 8,  9, 10, 11]])

In addition, you can more flexibly set different indexes in "row" and "column", as follows

arr[ [1,5,7,2], [0,3,1,2] ]
array([ 4, 23, 29, 10])

Equivalent code:

np.array( [ arr[1,0], arr[5,3], 
            arr[7,1], arr[2,2] ] )
array([ 4, 23, 29, 10])

Finally, we can change the original [0,1,2,3] column to [0,3,1,2].

arr[:,[0,3,1,2]] 
array([[ 0,  3,  1,  2],
       [ 4,  7,  5,  6],
       [ 8, 11,  9, 10],
       [12, 15, 13, 14],
       [16, 19, 17, 18],
       [20, 23, 21, 22],
       [24, 27, 25, 26],
       [28, 31, 29, 30]])

Summary

This section discusses the first three sections of numpy, array creation, array loading and array retrieval. We also regard numpy array as an object. To learn from it, we have to learn how to

  • Create it: step by step method, fixed point method, step by step method

  • Save it: save it in. NPY,. Txt and. csv format, and use it the next time you load it

  • Get it: slice one section and index the other; there are normal method, boolean method and fancy method

The next two sections discuss how to teach NumPy

  • Deform it: reshape and flatten, merge and split, element repeat and array repeat

  • Computing it: element level computing, linear algebra computing, broadcast mechanism computing

Transpose of the last problem array

arr = np.arange(16).reshape((2, 2, 4))
arr
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

Transfer dimensions 1, 2 and 3 to dimensions 2, 1 and 3, i.e. axis 0, 1 and 2 to axis 1, 0 and 2.

The essence of array transpose: exchange the shape and the span of each axis.

Please scan for more details:

Keywords: Python network less

Added by markmuir on Wed, 10 Jun 2020 07:57:14 +0300