Introduction to Numpy
NumPy is the basic package of scientific computing in Python. It is a python library, which provides a multi-dimensional array object, various derived objects (such as masked arrays and matrices), and various routines for fast operation of arrays, including mathematics, logic, shape operation, sorting, selection, I/O, discrete Fourier transform, basic linear algebra, basic statistical operation, random simulation and so on.
the core of NumPy package is the ndarray object. It encapsulates n-dimensional arrays of homogeneous data types, many of which are performed in compiled code to improve performance. There are several important differences between NumPy arrays and standard Python sequences:
- NumPy arrays are created with a fixed size, which is different from Python lists (which can grow dynamically). Changing the size of ndarray creates a new array and deletes the original array.
- The elements in the NumPy array need to have the same data type, so the memory size is the same. The exception is that there can be arrays of (Python, including NumPy) objects, allowing arrays of elements of different sizes.
- NumPy arrays help with advanced mathematics and other types of operations on large amounts of data. In general, such operations are performed more efficiently and with less code than using Python's built-in sequences.
- More and more Python based science and math packages are using NumPy arrays; Although these typically support Python sequence input, they convert such input to a NumPy array before processing, and they usually output a NumPy array. In other words, in order to effectively use many (and possibly most) of today's Python based scientific / mathematical software, just knowing how to use Python's built-in sequence types is not enough -- you also need to know how to use NumPy arrays.
sequence size and speed are particularly important in scientific calculation. As a simple example, consider multiplying each element in a one-dimensional sequence by a corresponding element in another sequence of the same length. If the data is stored in two Python lists, a and b, we can traverse each element:
c = [] for i in range(len(a)): c.append(a[i]*b[i])
that's the right answer, but if both a and b contain millions of numbers, we will pay for the inefficiency of loops in Python. We can accomplish the same task faster by writing in C (ignoring variable declaration and initialization, memory allocation, etc. for clarity):
for (i=0; i<rows; i++):{ c[i] = a[i] * b[i]; }
this saves all the overhead involved in interpreting Python code and manipulating Python objects, but at the expense of the benefits of coding in Python. In addition, the coding work required increases with the dimension of our data. For example, in the case of a 2-D array, the C code (truncated as before) is extended to:
for (i=0; i<rows; i++):{ for (i=0; j<colcumns; j++):{ c[i][j] = a[i][j] * b[i][j]; } }
NumPy gives us the best of both worlds: when it comes to ndarray, the element by element operation is the "default mode", but the element by element operation is quickly executed by precompiled C code. In NumPy:
c = a * b
we did the previous example at a speed close to C, but due to the simplicity of the code, we expect to get it from something based on Python. In fact, NumPy is even simpler! The last example illustrates two functions of NumPy, which are the basis of most of its functions: vectorization and broadcasting.
Why is NumPy fast?
vectorization describes that there are no explicit loops, indexes, etc. in the code - of course, these things only happen "behind the scenes" in optimized, precompiled C code. Vectorization code has many advantages, including:
- Vectorized code is simpler and easier to read
- Fewer lines of code usually mean fewer errors
- The code is closer to standard mathematical symbols (it is usually easier to correctly encode mathematical constructs)
- Vectorization leads to more "Python" code. Without vectorization, our code will be full of inefficient and hard to read for loops.
Broadcasting is a term used to describe the implicit element by element behavior of operations; Generally speaking, in NumPy, all operations, not just arithmetic operations, including logic, bits, functions, etc., operate implicitly element by element, that is, they broadcast. In addition, in the above example, it can be a multidimensional array with the same shape, a scalar and an array, or even two arrays with different shapes, provided that the smaller array can be extended to a larger shape, so that the generated broadcast is clear.
NumPy quick start
precondition
you need to know some Python. For learning, see Python 3.10 tutorial.
to handle these examples, you need to install matplotlib in addition to NumPy.
For learners
this is a quick overview of arrays in NumPy. It demonstrates the representation and operation of n-dimensional (n > = 2) arrays. In particular, if you don't know how to apply common functions to n-dimensional arrays (without using a for loop), or if you want to understand the axis and shape properties of n-dimensional arrays, this article may be helpful.
Learning objectives
After reading this article, you should be able to:
- Understand the difference between one-dimensional, two-dimensional and n-dimensional arrays in NumPy;
- Learn how to apply some linear algebraic operations to n-dimensional arrays without using the for loop;
- Understand the axis and shape properties of n-dimensional arrays.
Basic knowledge
The main object of NumPy is isomorphic multidimensional array. It is a table of elements (usually numbers), all of which have the same type and are indexed by non negative integer tuples. In NumPy, dimensions are called axes.
for example, the array [1, 2, 1] for coordinates of points in 3D space has an axis. There are three elements in this axis, so we say its length is 3. In the example shown below, the array has 2 axes. The length of the first axis is 2 and the length of the second axis is 3:
[[1. , 0. , 0.], [0. , 1. , 2.]]
The array class of NumPy is called ndarray, and its alias is also called array. Note that NumPy Array and the standard Python library class array Unlike array, the latter only deals with one-dimensional arrays and provides less functionality. The more important properties of the ndarray object are:
ndarray.ndim
The number of axes (dimensions) of the array.
ndarray.shape
The dimension of the array. This is an integer tuple that represents the size of the array in each dimension. For a matrix with n rows and m columns, the shape will be (n,m). Therefore, the length of the tuple is the number of axes ndim.
ndarray.size
The total number of elements in the array. This is equal to ndarray The product of the elements of the shape.
ndarray.dtype
An object that describes the type of elements in an array. You can create or specify dtype s using standard Python types. In addition, NumPy also provides its own type, NumPy int32,numpy.int16 and NumPy Float64 is some examples.
ndarray.itemsize
The size (in byte s) of each element in the array. For example, an array of elements of type float64 has an item size of 8 (= 64 / 8), while an element of type complex32 has an item size of 4 (= 32 / 8). It is equivalent to ndarray dtype. itemsize.
ndarray.data
buffer containing the actual elements of the array. Usually we don't need to use this attribute because we will use the indexing tool to access the elements in the array.
Examples
>>> import numpy as np >>> a = np.arange(15).reshape(3, 5) >>> a array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) >>> a.shape (3, 5) >>> a.ndim 2 >>> a.dtype.name 'int64' >>> a.itemsize 8 >>> a.size 15 >>> type(a) <class 'numpy.ndarray'> >>> b = np.array([6, 7, 8]) >>> b array([6, 7, 8]) >>> type(b) <class 'numpy.ndarray'>
Create Array
there are several ways to create arrays.
for example, you can use the array function to create an array from a regular Python list or tuple. The type of the generated array is derived from the type of elements in the sequence.
>>> import numpy as np >>> a = np.array([2, 3, 4]) >>> a array([2, 3, 4]) >>> a.dtype dtype('int64') >>> b = np.array([1.2, 3.5, 5.1]) >>> b.dtype dtype('float64')
common errors include calling array with multiple parameters instead of providing a single sequence as parameters.
>>> b = np.array(1, 2, 3, 4) # WRONG Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: array() takes from 1 to 2 positional arguments but 4 were given >>> a = np.array([1, 2, 3, 4]) # RIGHT
array converts the sequence into a two-dimensional array, converts the sequence into a three-dimensional array, and so on.
>>> b = np.array([(1.5, 2, 3), (4, 5, 6)]) >>> b array([[1.5, 2. , 3. ], [4. , 5. , 6. ]])
the type of array can also be explicitly specified during creation:
>>> c = np.array([[1, 2], [3, 4]], dtype=complex) >>> c array([[1.+0.j, 2.+0.j], [3.+0.j, 4.+0.j]])
usually, the elements of an array are initially unknown, but their size is known. Therefore, NumPy provides several functions to create an array with initial placeholder content. This minimizes the need for growing arrays, which is an expensive operation.
the zeros function creates an array filled with zero, the ones function creates an array filled with 1, and the empty function creates an array whose initial content is random and depends on the state of memory. By default, the dtype of the created array is float64, but it can be specified through the keyword parameter dtype.
>>> np.zeros((3,4)) array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) >>> np.ones((2,3,4), dtype=np.int16) array([[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]], [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]], dtype=int16) >>> np.empty((2,3)) array([[1.5, 2. , 3. ], [4. , 5. , 6. ]])
to create a sequence of numbers, NumPy provides a method similar to Python's built-in function range: array, but returns an array.
>>> np.arange(10, 30, 5) array([10, 15, 20, 25]) >>> np.arange(0, 2, 0.3) # It accepts the float parameter array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])
when using range with floating-point parameters, it is usually impossible to predict the number of elements obtained due to the limited floating-point precision. For this reason, it is usually better to use the function linspace to receive the number of elements we want as an argument instead of step:
>>> from numpy import pi >>> np.linspace(0, 3, 9) # # 9 numbers from 0 to 3 array([0. , 0.375, 0.75 , 1.125, 1.5 , 1.875, 2.25 , 2.625, 3. ]) >>> x = np.linspace(0, 2*pi, 100) # Evaluation functions are useful at many points >>> f = np.sin(x) >>> f array([ 0.00000000e+00, 6.34239197e-02, 1.26592454e-01, 1.89251244e-01, 2.51147987e-01, 3.12033446e-01, 3.71662456e-01, 4.29794912e-01, 4.86196736e-01, 5.40640817e-01, 5.92907929e-01, 6.42787610e-01, 6.90079011e-01, 7.34591709e-01, 7.76146464e-01, 8.14575952e-01, 8.49725430e-01, 8.81453363e-01, 9.09631995e-01, 9.34147860e-01, 9.54902241e-01, 9.71811568e-01, 9.84807753e-01, 9.93838464e-01, 9.98867339e-01, 9.99874128e-01, 9.96854776e-01, 9.89821442e-01, 9.78802446e-01, 9.63842159e-01, 9.45000819e-01, 9.22354294e-01, 8.95993774e-01, 8.66025404e-01, 8.32569855e-01, 7.95761841e-01, 7.55749574e-01, 7.12694171e-01, 6.66769001e-01, 6.18158986e-01, 5.67059864e-01, 5.13677392e-01, 4.58226522e-01, 4.00930535e-01, 3.42020143e-01, 2.81732557e-01, 2.20310533e-01, 1.58001396e-01, 9.50560433e-02, 3.17279335e-02, -3.17279335e-02, -9.50560433e-02, -1.58001396e-01, -2.20310533e-01, -2.81732557e-01, -3.42020143e-01, -4.00930535e-01, -4.58226522e-01, -5.13677392e-01, -5.67059864e-01, -6.18158986e-01, -6.66769001e-01, -7.12694171e-01, -7.55749574e-01, -7.95761841e-01, -8.32569855e-01, -8.66025404e-01, -8.95993774e-01, -9.22354294e-01, -9.45000819e-01, -9.63842159e-01, -9.78802446e-01, -9.89821442e-01, -9.96854776e-01, -9.99874128e-01, -9.98867339e-01, -9.93838464e-01, -9.84807753e-01, -9.71811568e-01, -9.54902241e-01, -9.34147860e-01, -9.09631995e-01, -8.81453363e-01, -8.49725430e-01, -8.14575952e-01, -7.76146464e-01, -7.34591709e-01, -6.90079011e-01, -6.42787610e-01, -5.92907929e-01, -5.40640817e-01, -4.86196736e-01, -4.29794912e-01, -3.71662456e-01, -3.12033446e-01, -2.51147987e-01, -1.89251244e-01, -1.26592454e-01, -6.34239197e-02, -2.44929360e-16])
Print Arrays
When printing an array, NumPy displays it in a manner similar to a nested list, but with the following layout:
The last axis prints from left to right,
The penultimate one prints from top to bottom,
The rest is also printed from top to bottom, with blank lines between each slice and the next slice.
Then print the one-dimensional array as a row, the two-dimensional array as a matrix, and the three-dimensional array as a matrix list.
>>> a = np.arange(6) # one-dimensional >>> print(a) [0 1 2 3 4 5] >>> >>> b = np.arange(12).reshape(4, 3) # two-dimensional >>> print(b) [[ 0 1 2] [ 3 4 5] [ 6 7 8] [ 9 10 11]] >>> >>> c = np.arange(24).reshape(2, 3, 4) # three-dimensional >>> print(c) [[[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] [[12 13 14 15] [16 17 18 19] [20 21 22 23]]]
for more details about reshape, see below.
if the array is too large to print, NumPy will automatically skip the center of the array and print only the corners:
>>> print(np.arange(10000)) [ 0 1 2 ... 9997 9998 9999] >>> >>> print(np.arange(10000).reshape(100, 100)) [[ 0 1 2 ... 97 98 99] [ 100 101 102 ... 197 198 199] [ 200 201 202 ... 297 298 299] ... [9700 9701 9702 ... 9797 9798 9799] [9800 9801 9802 ... 9897 9898 9899] [9900 9901 9902 ... 9997 9998 9999]]
to disable this behavior and force NumPy to print the entire array, use set_ Printoptionschange printing options.
>>> import sys >>> import numpy as np >>> print(sys.maxsize) 9223372036854775807 >>> np.set_printoptions(threshold=sys.maxsize)
basic operation
the arithmetic operators on the array are applied element by element to create a new array and fill it with the results.
>>> a = np.array([20, 30, 40, 50]) >>> b = np.arange(4) >>> b array([0, 1, 2, 3]) >>> c = a - b >>> c array([20, 29, 38, 47]) >>> b**2 array([0, 1, 4, 9]) >>> 10 * np.sin(a) array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854]) >>> a < 35 array([ True, True, False, False])
unlike many matrix languages, the product operator * operates by element in NumPy arrays. Matrix products can be performed using the @ operator (in Python > = 3.5) or dot functions or methods:
>>> A = np.array([[1, 1], [0, 1]]) >>> B = np.array([[2, 0], [3, 4]]) >>> A * B # Element product array([[2, 0], [0, 4]]) >>> A @ B # Matrix product array([[5, 4], [3, 4]]) >>> A.dot(B) # Matrix product array([[5, 4], [3, 4]])
some operations, such as + = and * =, modify the existing array in the appropriate place instead of creating a new array.
>>> rg = np.random.default_rng(1) # Create an instance of the default random number generator >>> a = np.ones((2, 3), dtype=int) >>> b = rg.random((2, 3)) >>> a *= 3 >>> a array([[3, 3, 3], [3, 3, 3]]) >>> b += a >>> b array([[3.51182162, 3.9504637 , 3.14415961], [3.94864945, 3.31183145, 3.42332645]]) >>> a += b # b is not automatically converted to integer type Traceback (most recent call last): File "<stdin>", line 1, in <module> numpy.core._exceptions.UFuncTypeError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int32') with casting rule 'same_kind'
when dealing with different types of arrays, the type of the generated array corresponds to the more general or precise array type (this behavior is called upcasting, upward transformation)
>>> import math >>> a = np.ones(3, dtype=np.int32) >>> b = np.linspace(0, math.pi, 3) >>> b.dtype.name 'float64' >>> c = a + b >>> c array([1. , 2.57079633, 4.14159265]) >>> c.dtype.name 'float64' >>> d = np.exp(c * 1j) >>> d array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j, -0.54030231-0.84147098j]) >>> d.dtype.name 'complex128'
many unary operations, such as calculating the sum of all elements in an array, are implemented as methods of the ndarray class.
>>> a = rg.random((2, 3)) >>> a array([[0.82770259, 0.40919914, 0.54959369], [0.02755911, 0.75351311, 0.53814331]]) >>> a.sum() 3.1057109529998157 >>> a.min() 0.027559113243068367 >>> a.max() 0.8277025938204418
by default, these operations apply to an array as if it were a list of numbers, regardless of its shape. However, by specifying the axis parameter, you can apply the operation along the specified axis of the array:
>>> b = np.arange(12).reshape(3, 4) >>> b array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) >>> b.sum(axis=0) # Sum of each column array([12, 15, 18, 21]) >>> >>> b.min(axis=1) # Minimum value per row array([0, 4, 8]) >>> b.cumsum(axis=1) # Cumulative sum along each line array([[ 0, 1, 3, 6], [ 4, 9, 15, 22], [ 8, 17, 27, 38]])
General function
NumPy provides familiar mathematical functions, such as sin, cos and exp. In NumPy, these are called "universal functions" (ufunc). In NumPy, these functions perform element operations on the array and generate an array as output.
>>> B = np.arange(3) >>> B array([0, 1, 2]) >>> np.exp(B) # Matrix element power of e array([1. , 2.71828183, 7.3890561 ]) >>> np.sqrt(B) # Square root of matrix element array([0. , 1. , 1.41421356]) >>> C = np.array([2., -1., 4.]) >>> np.add(B, C) # Summation of matrix elements array([2., 0., 6.])
general functions include:
exp | sqrt | add | all | any |
apply_along_axis | argmax | argmin | argsort | average |
bincount | ceil | clip | conj | corrcoef |
cov | cross | cumprod | cumsum | diff |
dot | floor | inner | invert | lexsort |
max | maximum | mean | median | min |
minimum | nonzero | outer | prod | re |
round | sort | std | sum | trace |
transpose | var | vdot | vectorize | where |
Indexing, slicing, and iteration
one dimensional arrays can be indexed, sliced and iterated, just like lists and other Python sequences.
>>> a = np.arange(10)**3 >>> a array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729], dtype=int32) >>> a[2] 8 >>> a[2:5] array([ 8, 27, 64], dtype=int32) >>># The following slice is equivalent to a[0:6:2] >>># From 0 index to 6 index, every second element is 1000 >>> a[:6:2] = 1000 >>> a array([1000, 1, 1000, 27, 1000, 125, 216, 343, 512, 729], dtype=int32) >>> a[::-1] # a flip array([ 729, 512, 343, 216, 125, 1000, 27, 1000, 1, 1000], dtype=int32) >>> for i in a: ... print(i**(1 / 3.)) ... 9.999999999999998 1.0 9.999999999999998 3.0 9.999999999999998 5.0 5.999999999999999 6.999999999999999 7.999999999999999 8.999999999999998