NumPy: introduction to NumPy

brief introduction

NumPy is an open source Python library, which is mainly used in data analysis and scientific computing. Basically, NumPy can be regarded as the basis of Python data computing, because many excellent data analysis and machine learning frameworks use NumPy at the bottom. For example: pandas, SciPy, Matplotlib, scikit learn, scikit image, etc.

NumPy library mainly contains multi-dimensional array and matrix data structures. It provides a method for ndarray (an n-dimensional array object) to effectively operate on it. NumPy can be used to perform various mathematical operations on arrays. It also provides a huge library of advanced mathematical functions that can run on these arrays and matrices.

Installing NumPy

There are many ways to follow NumPy:

pip install numpy

If you use conda, you can:

conda install numpy

Or use Anaconda It is a collection of data analysis packages.
Array and List
Python has a data type called list, which can store different kinds of objects. There is no problem doing this in the application, but in scientific calculation, we hope that the element types in an Array must be consistent, so we have the Array in NumPy.

NumPy can quickly create an Array and operate on the data in it.

Array in NumPy is much faster than List in Python and takes up less memory space.

Look at the performance differences between the two:

In [1]: import numpy as np
   ...: my_arr = np.arange(1000000)
   ...: my_list = list(range(1000000))
   ...: %time for _ in range(10): my_arr2 = my_arr * 2
   ...: %time for _ in range(10): my_list2 = [x * 2 for x in my_list]
   ...:
CPU times: user 12.3 ms, sys: 7.88 ms, total: 20.2 ms
Wall time: 21.4 ms
CPU times: user 580 ms, sys: 172 ms, total: 752 ms
Wall time: 780 ms

The above example multiplies a data containing one million by two. It can be seen that the efficiency of using NumPy is dozens of times that of Python. This efficiency will have a great performance impact in large data projects.

Create Array

In the above example, we have created an array using NP Arange method.

We can also create an Array through a List, which can be either a one-dimensional List or a multi-dimensional List:

>>> a = np.array([1, 2, 3, 4, 5, 6])

>>> a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

Like List, Array can also be accessed through index:

>>> print(a[0])
[1 2 3 4]

Next, let's introduce some common nouns:

  • vector - represents a one-dimensional array
  • matrix - represents a two-dimensional array
  • tensor - represents an array of three dimensions or higher

In NumPy, dimensions are also called axes.
Let's look at several other methods to create an Array:
The simplest is NP Array, as we mentioned in the previous example.
If you want to quickly create arrays that are all 0, you can use zeros:

>>> np.zeros(2)
array([0., 0.])

Or both are filled with 1:

>>> np.ones(2)
array([1., 1.])

You can also create an empty array:

In [2]: np.empty(2)
Out[2]: array([0.        , 2.00389455])

Note that the contents in the empty method are not necessarily empty, but randomly filled with data, so we must remember to overwrite the contents after creating an array with empty. The advantage of using empty is that it can be created faster.

You can also fill the array within the range:

In [3]: np.arange(10)
Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

You can specify the interval:

In [4]: np.arange(1,10,2)
Out[4]: array([1, 3, 5, 7, 9])

Using linspace, you can create an equally divided array:

In [5]: np.linspace(0, 10, num=5)
Out[5]: array([ 0. ,  2.5,  5. ,  7.5, 10. ])

By default, the array content type created is NP Float64, we can also switch it to an integer:

np.int64
In [6]: x = np.ones(2, dtype=np.int64)

In [7]: x
Out[7]: array([1, 1])

Array operation
sort
We can use sort to sort the array:

In [8]: arr = np.array([2, 1, 5, 3, 7, 4, 6, 8])

In [10]: np.sort(arr)
Out[10]: array([1, 2, 3, 4, 5, 6, 7, 8])

==Sort = = sorts the elements in the Array. There are other sorting methods besides sort.
You can also use argsort. Argsort is an indirect sorting method. It returns the index of the sorted original array:

In [11]: x = np.array([10, 5, 6])

In [12]: np.argsort(x)
Out[12]: array([1, 2, 0])

Above, we sorted the array with = = argsort = =. After sorting, we should return 5, 6, 10. The index of 5 is 1, the index of 6 is 2, and the index of 10 is 0, so 1, 2, 0 is returned.
==Like argsort, lexport = = is an indirect sort method and returns the sorted index. The difference is that lexport can sort multiple key s.

surnames =    ('Hertz',    'Galilei', 'Hertz')
first_names = ('Heinrich', 'Galileo', 'Gustav')
ind = np.lexsort((first_names, surnames))
ind
array([1, 2, 0])

The above lexport is sorted by names first, and then by first_names.

Lexport is sorted from back to front. That is, the last key passed in is sorted first.

==searchsorted = = used to find the index value of the element to be inserted, for example:

np.searchsorted([1,2,3,4,5], 3)
2
np.searchsorted([1,2,3,4,5], 3, side='right')
3
np.searchsorted([1,2,3,4,5], [-10, 10, 2, 3])
array([0, 5, 1, 2])

==Partition = = partition the data to be sorted, for example:

a = np.array([3, 4, 2, 1])
np.partition(a, 3)
array([2, 1, 3, 4])

The first parameter is an Array, and the second parameter is the benchmark element to be separated. The position of the benchmark element is the same as that after sorting. Other elements smaller than the benchmark element are placed in the front, and those larger than the benchmark element are placed in the back.
You can also split by multiple elements:

np.partition(a, (1, 3))
array([1, 2, 3, 4])

concatenate
Concatenate is used to concatenate multiple arrays.

>>> a = np.array([1, 2, 3, 4])
>>> b = np.array([5, 6, 7, 8])

>>> np.concatenate((a, b))
array([1, 2, 3, 4, 5, 6, 7, 8])

You can also connect multidimensional arrays:

>>> x = np.array([[1, 2], [3, 4]])
>>> y = np.array([[5, 6]])
>>> np.concatenate((x, y), axis=0)
array([[1, 2],
       [3, 4],
       [5, 6]])

statistical information
ndarray.ndim is used to count the dimensions of the array:

>>> array_example = np.array([[[0, 1, 2, 3],
...                            [4, 5, 6, 7]],
...
...                           [[0, 1, 2, 3],
...                            [4, 5, 6, 7]],
...
...                           [[0 ,1 ,2, 3],
...                            [4, 5, 6, 7]]])
>>> array_example.ndim
3

ndarray.size is used to count the number of elements in the array:

>>> array_example.size
24

ndarray.shape output array shape:

>>> array_example.shape
(3, 2, 4)

Note the above array is a 3 2 4 array.
reshape
Using reshape, you can reconstruct an array.

>>> a = np.arange(6)
>>> print(a)
[0 1 2 3 4 5]

>>> b = a.reshape(3, 2)
>>> print(b)
[[0 1]
 [2 3]
 [4 5]]

Above, we convert a one-dimensional array into a 3 * 2 array.
reshape can also accept multiple parameters:

>>> numpy.reshape(a, newshape=(1, 6), order='C')
array([[0, 1, 2, 3, 4, 5]])

The first parameter is the array to be reconstructed. The second parameter is the new shape. order can take three values, C, F or A.
C means sorting according to C's index method, and F means sorting according to Fortran's index method. A indicates automatic selection.
In Fortran, when moving the elements of a two-dimensional array stored in memory, the first index is the index that changes the fastest. When the first index changes and moves to the next row, the matrix stores one column at a time. On the other hand, in C, the last index changes the fastest.
Add dimension
np.newaxis can add a dimension to an existing array:

>>> a = np.array([1, 2, 3, 4, 5, 6])
>>> a.shape
(6,)

>>> a2 = a[np.newaxis, :]
>>> a2.shape
(1, 6)

>>> col_vector = a[:, np.newaxis]
>>> col_vector.shape
(6, 1)

You can also use expand_dims to specify the location of axis:

>>> b = np.expand_dims(a, axis=1)
>>> b.shape
(6, 1)

>>> c = np.expand_dims(a, axis=0)
>>> c.shape
(1, 6)

index and slice
The index and slice of the array are similar to the list in Python:

>>> data = np.array([1, 2, 3])

>>> data[1]
2
>>> data[0:2]
array([1, 2])
>>> data[1:]
array([2, 3])
>>> data[-2:]
array([2, 3])

In addition, the array also supports more powerful index operations:

>>> a = np.array([[1 , 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

>>> print(a[a < 5])
[1 2 3 4]

Above, we find the values of all elements in a that are less than 5.

In [20]: a<5
Out[20]:
array([[ True,  True,  True,  True],
       [False, False, False, False],
       [False, False, False, False]])

You can see that a < 5 actually returns an array. The element shape of this array is the same as the original array, but the values inside are true and false, indicating whether it should be selected.
Similarly, we can pick out all elements greater than 5:

>>> five_up = (a >= 5)
>>> print(a[five_up])
[ 5  6  7  8  9 10 11 12]

Select all numbers that can be divided by 2:

>>> divisible_by_2 = a[a%2==0]
>>> print(divisible_by_2)
[ 2  4  6  8 10 12]

You can also use the & and | operators:

>>> c = a[(a > 2) & (a < 11)]
>>> print(c)
[ 3  4  5  6  7  8  9 10]

You can also use nonzero to print the index information that meets the conditions:

In [23]: a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

In [24]: b = np.nonzero(a < 5)

In [25]: b
Out[25]: (array([0, 0, 0, 0]), array([0, 1, 2, 3]))
  
>>> print(a[b])
[1 2 3 4]

In the tuples returned above, the first value represents the row number and the second value represents the column.

Keywords: Python

Added by anthony522 on Thu, 06 Jan 2022 10:04:30 +0200