1. Recognize NumPy array objects
One of the most important features of NumPy is its N-dimensional array object, namely the ndarray (alias array) object, which can perform some scientific calculations.
Some important properties are defined in the ndarray object.
2. Create NumPy array
The simplest way to create an ndarray object is to use the array() function, which passes in a list or tuple when called
# Create a one-dimensional array data1 = np.array([1, 2, 3]) array([1, 2, 3]) # Create a two-dimensional array data2 = np.array([[1, 2, 3], [4, 5, 6]]) array([[1, 2, 3], [4, 5, 6]])
Create an array with element values of 0 through the zeros() function; Create an array with element values of 1 through the ones() function.
# Create an array with all element values of 0 np.zeros((3, 4)) array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) # Create an array whose element values are all 1 np.ones((3, 4)) array([[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]])
Create a new array through the empty() function. The array only allocates memory space, and the elements filled in it are random.
# Create an array whose element values are all random numbers np.empty((5, 2)) array([[-2.00000000e+000, -2.00390463e+000], [ 2.37663529e-312, 2.56761491e-312], [ 8.48798317e-313, 9.33678148e-313], [ 8.70018275e-313, 2.12199581e-314], [ 0.00000000e+000, 6.95335581e-309]])
You can create an equal difference array through the range() function. Its function is similar to range(), but the result returned by the range() function is an array, not a list.
np.arange(1, 20, 5) array([ 1, 6, 11, 16])
You may notice that some array elements are followed by a decimal point, while some elements are not, such as 1 and 1, This phenomenon is mainly caused by different data types of elements.
3. Data type of ndarray object
3.1 viewing data types
ndarray.dtype can create an object representing the data type. If you want to get the name of the data type, you need to access the name attribute to get it.
data_one = np.array([[1, 2, 3], [4, 5, 6]]) data_one.dtype.name 'int32'
The data type of NumPy consists of a type name and a number with element bit length.
adopt zeros(),ones(),empty()Function. The default data type is float64. By default, 64 bit windows The output of the system is int32， 64 position Linux or macOS The system output is int64，Of course, it can also be passed dtype To specify the length of the data type.
The data types commonly used in NumPy are shown in the chart.
Each NumPy built-in data type has a feature code, which can uniquely identify a data type.
3.2 converting data types
The data type of the ndarray object can be converted through the astype() method.
data = np.array([[1, 2, 3], [4, 5, 6]]) data.dtype dtype('int64') # Convert data type to float64 float_data = data.astype(np.float64) float_data.dtype dtype('float64')
4. Array operation
4.1 vectorization operation
Array operations can be divided into the following three types:
Any arithmetic operation between arrays with equal shapes will be applied to the element level, that is, it is only used between elements with the same position, and the resulting operation results form a new array.
4.2 array broadcast
When arrays with unequal shapes perform arithmetic calculation, a broadcast mechanism will appear, which will expand the array to make the shape attribute value of the array the same, so that vectorization can be carried out.
The broadcast mechanism needs to meet any of the following conditions:
(1) One dimension of two arrays is equal in length.
(2) One of the arrays is a one-dimensional array.
The broadcast mechanism needs to expand the array with small dimension to make it the same as the shape value of the array with the largest dimension, so as to use element level functions or operators for operation.
4.3 operation between array and scalar
Scalar operation will produce a new matrix with the same rows and columns as the array, and each element of the original matrix will be added, subtracted, multiplied or divided.
5. Index and slice of ndarray
5.1 basic use of integer index and slice
For a one-dimensional array, on the surface, it uses indexing and slicing, which is not different from the function of Python list.
arr = np.arange(8) # Gets the element with index 5 arr 5 # Get elements with indexes of 3 ~ 5, but excluding 5 arr[3:5] array([3, 4])
For multi-dimensional arrays, the use of indexes and slices is very different from that of lists. For example, the index method of two-dimensional arrays is as follows:
In a two-dimensional array, the element at each index position is no longer a scalar, but a one-dimensional array.
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # Gets the element with index 1 arr2d array([4, 5, 6])
If you want to obtain a single element of a two-dimensional array, you need to implement it through an index like "arr[x, y]", where X represents the row number and Y represents the column number.
# Get the element at row 0, column 1 arr2d[0, 1] 2
The slice of multidimensional array selects elements along the direction of rows or columns. We can pass in one slice or multiple slices, and we can also mix slices with integer indexes.
Use a slice example:
arr2d[:2] array([[1, 2, 3], [4, 5, 6]])
Use two slice examples:
arr2d[0:2, 0:2] array([[1, 2], [4, 5]])
Example of mixed use of slice and integer index:
arr2d[1, :2] array([[1, 2], [4, 5]])
5.2 basic use of fancy (array) index
Fancy index is a term of NumPy, which refers to indexing with an integer array or list, and then taking each element in the array or list as a subscript.
When using an array or list as an index, if the object to be operated by using the index is a one-dimensional array, the obtained result is the element corresponding to the subscript.
If the object to be manipulated is a two-dimensional array, the obtained result is a row of data corresponding to the subscript.
# Create a two-dimensional array demo_arr = np.empty((4, 4)) for i in range(4): demo_arr[i] = np.arange(i, i + 4) # Get element with index [0,2] demo_arr[[0, 2]]
If two fancy indexes are used to operate the array, the first one will be used as the row index and the second as the column index, and the elements at their corresponding positions will be selected in the way of two-dimensional array index.
Insert code slice here# Gets the elements with indexes (1,1) and (3,2) demo_arr[[1, 3], [1, 2]]
5.3 basic use of Boolean index
Boolean index refers to taking a Boolean array as an array index, and the returned data is the value of the position corresponding to True in the Boolean array.
6. Transpose and axisymmetry of array
Transpose of array refers to the position transformation of each element in the array according to certain rules.
NumPy provides two implementations:
A simple transpose can use the T attribute, which is actually an axis swap.
When using the transfer () method to exchange the shape of the array, the number of the shape needs to be passed in as a tuple, such as (1,0,2).
If we call the transfer () method directly without entering any parameters, the effect of its execution is to transpose the array, which is equivalent to transfer (2,1,0).
Sometimes only two axes may need to be converted. At this time, it can be implemented using the swaaxes () method, which needs to accept a pair of axis numbers, such as (1,0).
When performing some operations (such as transpose) on high-dimensional data, you need to specify the dimension number, which starts from 0 and then increases by 1. Among them, the longitudinal axis (y axis) is numbered 0, the transverse axis (x axis) is numbered 1, and so on.
7.NumPy general function
Universal function (ufunc) is a function that performs element level operations on the data in ndarray. The function returns a new array.
We call the function that receives an array parameter in ufunc a unary general function, and the function that accepts two array parameters a binary general function.
Common univariate general functions are as follows:
Common binary general functions are as follows:
8. Use NumPy array for data processing
8.1 convert conditional logic into array operation
The where() function of NumPy is a vectorized version of the ternary expression x if condition else y.
arr_x = np.array([1, 5, 7]) arr_y = np.array([2, 6, 8]) arr_con = np.array([True, False, True]) result = np.where(arr_con, arr_x, arr_y) array([1, 6, 7])
8.2 array statistical operation
Through the relevant methods in NumPy library, we can easily use Python to make statistical summary of arrays.
8.3 array sorting
If you want to sort the elements in the NumPy array, you can use the sort() method.
arr = np.array([[6, 2, 7], [3, 6, 2], [4, 3, 2]]) arr.sort() array([[2, 6, 7], [2, 3, 6], [2, 3, 4]])
If you want to sort the elements on any axis, you need to pass the axis number as a parameter of the sort() method.
arr = np.array([[6, 2, 7], [3, 6, 2], [4, 3, 2]]) # Sort the elements along the axis numbered 0 arr.sort(0) array([[3, 2, 2], [4, 3, 2], [6, 6, 7]])
8.4 retrieving array elements
The all() function is used to judge whether the values of the elements in the whole array meet the conditions. If the conditions are met, it returns True, otherwise it returns False.
arr = np.array([[1, -2, -7], [-3, 6, 2], [-4, 3, 2]]) # Are all elements of arr greater than 0 np.all(arr > 0) False
The any() function is used to judge that at least one element in the whole array will return True if it meets the conditions, otherwise it will return False.
arr = np.array([[1, -2, -7], [-3, 6, 2], [-4, 3, 2]]) # Is one of all elements of arr greater than 0 np.any(arr > 0) True
8.5 uniqueness and other set logic
For one-dimensional arrays, NumPy provides the unique() function to find the unique value in the array and return the sorted results.
arr = np.array([12, 11, 34, 23, 12, 8, 11]) np.unique(arr) array([ 8, 11, 12, 23, 34])
The in1d() function is used to determine whether the elements in the array exist in another array. The function returns a Boolean array.
arr = np.array([12, 11, 34, 23, 12, 8, 11]) np.in1d(arr, [11, 12]) array([ True, True, False, False, True, False, True])
NumPy provides many functions related to sets. The common functions are shown in the table below.
9. Linear algebra module
numpy.linalg module has a set of standard matrix decomposition operations and things like inverse and determinant.
For example, matrix multiplication, if we multiply two arrays by "*", we get an element level product, not a matrix dot product.
NumPy provides a dot() method for matrix multiplication.
arr_x = np.array([[1, 2, 3], [4, 5, 6]]) arr_y = np.array([[1, 2], [3, 4], [5, 6]]) # Equivalent to NP dot(arr_x, arr_y) arr_x.dot(arr_y) array([[22, 28], [49, 64]])
The condition of matrix dot product is that the number of columns of matrix A is equal to the number of rows of matrix B. assuming that a is the matrix of mp and B is the matrix of pn, the product of matrix A and B is a matrix C of m*n, in which the elements of row i and column j of matrix C can be expressed as:
In addition, there are many other useful functions in the linalg module.
10. Random number module
Compared with Python's random module, NumPy's random module has more functions. It adds some functions that can efficiently generate sample values of multiple probability distributions.
# Randomly generate a two-dimensional array np.random.rand(3, 3)
The rand() function belongs to numpy Random module, which is used to randomly generate N-dimensional floating-point arrays.
In addition, the random module also includes other functions that can generate random numbers that obey a variety of probability distributions.
The seed() function can ensure that the generated random numbers are predictable, that is, the generated random numbers are the same.
There is only one seed parameter in the above function, which is used to specify the integer value at the beginning of the algorithm used to generate random numbers.
Note: when calling the seed() function, if the values passed to the seed parameter are the same, the random number generated each time is the same.
When the passed parameter values are different or no parameters are passed, the function of seed() is the same as that of rand(), that is, random numbers are generated multiple times and different random numbers are generated each time.
This chapter mainly introduces the scientific computing library NumPy, including the attributes and data types of array objects, array operations, index and slice operations, transpose and axisymmetry of arrays, NumPy general functions, linear algebra module, random number module and related operations of data processing using arrays.
Through the study of this chapter, I hope you can skillfully use NumPy package to lay a foundation for the study of the following chapters.