Python_ Data analysis_ numpy module

Numpy can be said to be an important foundation for Python's application to artificial intelligence and scientific computing. I will not repeat the introduction of the library. I mainly share some summarized usage of numpy library.

1. numpy array object

The multidimensional array in Numpy is called ndarray, which is the most common array object in Numpy. The ndarray object usually consists of two parts:

  • ndarray data itself
  • Metadata describing data

Advantages of Numpy arrays

  • Numpy arrays are usually composed of elements of the same kind, that is, the data items in the array are of the same type. This has the advantage of quickly determining the size of the space required to store data because you know that the types of array elements are the same.
  • Numpy array can use vectorization operation to process the whole array, which is fast; Python lists usually need to traverse the list with the help of circular statements, which is relatively inefficient.
  • Numpy uses the optimized C API, with fast operation speed

2. Creation of numpy array (ndarray)

2.1 array()

It is created in the array mode, and a list implementation is passed into the array

    import numpy as np
    
    array1 = np.array([1, 2, 3])
    array2 = np.array([[1, 2, 3],
                       [4, 5, 6],
                       [7, 8, 9]])
    
    print(array1)
    print(array2)
    
    [Runing] ============
    [1 2 3]
    [[1 2 3]
     [4 5 6]
     [7 8 9]]
    

2.2 arange()

Create an array through range: in the following example, create a row vector with an interval of 0 ~ 1 and 0.1, starting from 0 and excluding 1. In the second example, generate a multi-dimensional array through aligned broadcasting.

    import numpy as np
    
    array1 = np.arange(0, 1, 0.1)
    array2 = np.arange(1, 70, 10).reshape(-1, 1) + np.arange(0, 7)
    array3 = np.arange(24).reshape(2,3,4)
    
    print(array1)
    print(array2)
    print(arrat3)
    
    [Running]=======
    [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
    
    [[ 1  2  3  4  5  6  7]
     [11 12 13 14 15 16 17]
     [21 22 23 24 25 26 27]
     [31 32 33 34 35 36 37]
     [41 42 43 44 45 46 47]
     [51 52 53 54 55 56 57]
     [61 62 63 64 65 66 67]]
     
    array([[[ 0,  1,  2,  3],
            [ 4,  5,  6,  7],
            [ 8,  9, 10, 11]],
    
           [[12, 13, 14, 15],
            [16, 17, 18, 19],
            [20, 21, 22, 23]]])

2.3 linspace()&logspace()

Create an array through the linspace function: in the following example, create a row vector with 0 ~ 1 interval of 1 / 9 (generated in the form of arithmetic sequence), starting from 0 and including 1
Create an array through the logspace function: in the following example, create a row vector with 1 ~ 100 and 20 elements (generated in the form of proportional sequence), where 0 represents 10, 0 = 1 and 2 represents 10
2 = 100, starting from 1, including 100

    import numpy as np
    
    array1 = np.linspace(0, 1, 10)
    array2 = np.logspace(0, 2, 20)
    
    print(array1)
    print(array2)
    
    [Running]===========
    [0.         0.11111111 0.22222222 0.33333333 0.44444444 0.55555556
     0.66666667 0.77777778 0.88888889 1.        ]
    [  1.           1.27427499   1.62377674   2.06913808   2.6366509
       3.35981829   4.2813324    5.45559478   6.95192796   8.8586679
      11.28837892  14.38449888  18.32980711  23.35721469  29.76351442
      37.92690191  48.32930239  61.58482111  78.47599704 100.        ]

2.4 generating special sequence

  • ones,ones_like, create an array of all 1 according to the shape, and the latter can copy the shapes of other arrays
  • zeros,zeros_like, similar to the above, all 0
  • empty,empty_like, create a new array and allocate only space
  • eye, identity, create a diagonal matrix with diagonal 1
    Note that you should specify the size of the array (specified with a tuple) and the type of element, otherwise an error will be reported. This part of the function is relatively simple and is not tested

2.5 generating random arrays

The simple use of random numbers to generate arrays is similar to the generation of array () method. More generation methods can find and read in more detail.

    import numpy as np
    from numpy.random import randn
    
    arr = randn(12).reshape(3, 4)
    print(arr)
    
    [Running]========
     [[ 0.98655235  1.20830283 -0.72135183  0.40292924]
      [-0.05059849 -0.02714873 -0.62775486  0.83222997]
      [-0.84826071 -0.29484606 -0.76984902  0.09025059]]
     

3. dtype

All data types of Numpy are as follows:
! [insert picture description here]( https://img-blog.csdnimg.cn/20190806182345202.png?x-oss-
process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDU4NTgzOQ==,size_16,color_FFFFFF,t_70)
Each data type has a corresponding data conversion function. Creating an array can be described by dtype

    import numpy as np
    
    array1 = np.ones((2, 3), dtype = 'float')
    array2 = np.array([1. ,2. ,3. ], dtype = 'int32')
    
    print(array1)
    print(array2)
    
    [Running] =========
    [[1. 1. 1.]
     [1. 1. 1.]]
     
    [1 2 3]

4. ndarray() attribute

4.1 dtype attribute

The data type and type of ndarray array have been described earlier.
It should be noted that the imaginary part of the complex number is j and cannot be converted to float data type

4.2 ndim attribute

ndim is used to describe the dimension of an array. It should be noted that this attribute is more practical for two-dimensional arrays. When the array dimension is three-dimensional, only the penultimate dimension can be returned.

    import numpy as np
    
    array1 = np.ones(24).reshape(2,3,4)
    array2 = np.ones(8).reshape(2,4)
    
    print(array1.ndim)
    print(array2.ndim)
    
    [Running]========
    3
    2
    

4.3 shape properties

The scale of an array object. For a matrix, that is, n rows and m columns, shape is a tuple, which is more comprehensive than the ndim attribute.

    import numpy as np
    
    array1 = np.ones(24).reshape(2,3,4)
    array2 = np.ones(8).reshape(2,4)
    
    print(array1.shape)
    print(array2.shape)
    
    [Running]=========
    (2, 3, 4)
    (2, 4)

4.4 size attribute

The size attribute is the number of elements held in the array

    import numpy as np
    
    array1 = np.ones(24).reshape(2,3,4)
    array2 = np.ones(8).reshape(2,4)
    
    print(array1.size)
    print(array2.size)
    
    [Running]=======
    24
    8

4.5 itemsize attribute

The itemsize property returns the number and size of bytes occupied by each element in the array.

    import numpy as np
    
    array1 = np.ones(24).reshape(2,3,4)
    array2 = np.ones(8).reshape(2,4)
    
    print(array1.itemsize)
    print(array2.itemsize)
    
    [Running]==========
    8
    8

4.6 nbytes attribute

If you want to know the number of bytes required for the entire array, you can use the nbytes attribute. Its value is equal to the size property value of the array multiplied by the itemsize property value.

    import numpy as np
    
    array1 = np.ones(24).reshape(2,3,4)
    array2 = np.ones(8).reshape(2,4)
    
    print(array1.nbytes)
    print(array2.nbytes)
    
    [Running]===========
    192
    64

4.7 T attribute

    import numpy as np
    
    array1 = np.arange(24).reshape(4,6)
    
    print(array1)
    print(array1.T)
    
    [Running]=======
    [[ 0  1  2  3  4  5]
     [ 6  7  8  9 10 11]
     [12 13 14 15 16 17]
     [18 19 20 21 22 23]]
    [[ 0  6 12 18]
     [ 1  7 13 19]
     [ 2  8 14 20]
     [ 3  9 15 21]
     [ 4 10 16 22]
     [ 5 11 17 23]]

4.8 real & imag attribute

Real and imag return the real part and imaginary part of the array respectively. There is no example here.

4.9 flat attribute

The flat property returns a numpy Flatter object, which is the object of iteration.

    import numpy as np
    
    array1 = np.arange(6).reshape(2,3)
    f = array1.flat
    
    print(array1)
    for i in f:
    	print(i)
    
    [Running]==========
    [[0 1 2]
     [3 4 5]]
    0
    1
    2
    3
    4
    5

After the flat attribute generates an iterator, it can perform some assignment and index operations

    import numpy as np
    
    array1 = np.array([[11, 12, 13, 14],
    	               [21, 22, 23, 24],
    	               [31, 32, 33, 34]])
    array1.flat = 7
    print(array1)
    [Running]=======
    [[7 7 7 7]
     [7 7 7 7]
     [7 7 7 7]]
    import numpy as np
    
    array1 = np.array([[11, 12, 13, 14],
    	               [21, 22, 23, 24],
    	               [31, 32, 33, 34]])
    f = array1.flat
    print(f[[1,4]])
    [Running]==========
    [12 21]
    
    import numpy as np
    
    array1 = np.array([[11, 12, 13, 14],
    	               [21, 22, 23, 24],
    	               [31, 32, 33, 34]])
    f = array1.flat
    f[[1,5]] = 7
    print(array1)
    [Running]=======
    [[11  7 13 14]
     [21  7 23 24]
     [31 32 33 34]]
    
    

5. adarray slice and index

The index of adarray is similar to the list. Several dimensions can be referenced by several dimensions

    import numpy as np
    
    array1 = np.array([1, 2, 3, 4])
    
    array2 = np.array([[11, 12, 13, 14],
    	               [21, 22, 23, 24],
    	               [31, 32, 33, 34]])
    
    print(array1[2])     # One dimensional array index
    print(array2[0][0])  # Two dimensional array index
    print(array2[0])     # Two dimensional array index row
    print(array2.T[0])   # Two dimensional array index column
    
    [Running]===========
    3
    11
    [11 12 13 14]
    [11 21 31]

6. shape change

6.1 shape transformation

6.1.1 shape & resize

Both shape and resize can change the dimension of the array. Resize changes the original array and shape changes the single view

    import numpy as np
    
    array1 = np.array([[11, 12, 13, 14],
    	               [21, 22, 23, 24],
    	               [31, 32, 33, 34]])
    
    print(array1.reshape(2,6))
    array1.resize(4,3)
    print(array1)
    
    [Running]============
    [[11 12 13 14 21 22]
     [23 24 31 32 33 34]]
    [[11 12 13]
     [14 21 22]
     [23 24 31]
     [32 33 34]]
6.1.2 ravel() & flatten()

T ravel() and
flatten() is about converting multidimensional arrays into one-dimensional arrays. The difference between the two is whether to return a copy or a view. Flat () returns a copy and needs to allocate new memory space. The modifications made to the copy will not affect the original matrix, while travel () returns a view and will affect the original matrix.

    import numpy as np
    
    array1 = np.array([[11, 12, 13, 14],
    	               [21, 22, 23, 24],
    	               [31, 32, 33, 34]])
    
    print(array1.flatten())
    array1.flatten()[0] = 0
    print(array1)
    
    print(array1.ravel())
    array1.ravel()[0] = 0
    print(array1)
    
    [Running]===========
    [11 12 13 14 21 22 23 24 31 32 33 34]
    [[11 12 13 14]
     [21 22 23 24]
     [31 32 33 34]]
    [11 12 13 14 21 22 23 24 31 32 33 34]
    [[ 0 12 13 14]
     [21 22 23 24]
     [31 32 33 34]]
6.1.3 transpose

The attribute (T) of array transpose described earlier can also be implemented through the transfer () function

    import numpy as np
    
    array1 = np.array([[11, 12, 13, 14],
    	               [21, 22, 23, 24],
    	               [31, 32, 33, 34]])
    
    print(array1.transpose())
    
    [Running]============
    [[11 21 31]
     [12 22 32]
     [13 23 33]
     [14 24 34]]

6.2 stacked arrays

6.2.1 hstack() & column_stack()

Hsstack() is horizontal stacking, that is, two arrays are horizontally connected. When stacking horizontally, you should pay attention to the same rows

    import numpy as np
    
    array1 = np.array([[11, 12, 13, 14],
    	               [21, 22, 23, 24],
    	               [31, 32, 33, 34]])
    array2 = array1 * 2
    print(np.hstack((array1,array2)))
    
    [Running]=================
    [[11 12 13 14 22 24 26 28]
     [21 22 23 24 42 44 46 48]
     [31 32 33 34 62 64 66 68]]
6.2.2 vstack() & row_stack()

vstack() is a vertical stack

    import numpy as np
    
    array1 = np.array([[11, 12, 13, 14],
    	               [21, 22, 23, 24],
    	               [31, 32, 33, 34]])
    array2 = array1 * 2
    print(np.vstack((array1,array2)))
    
    [Running]=========
    [[11 12 13 14]
     [21 22 23 24]
     [31 32 33 34]
     [22 24 26 28]
     [42 44 46 48]
     [62 64 66 68]]
6.2.3 concatenate()

Set the stacking direction by setting the value of axis. When axis=1, stack along the horizontal direction; When axis=0, it is superimposed along the vertical direction

    import numpy as np
    
    array1 = np.array([[11, 12, 13, 14],
    	               [21, 22, 23, 24],
    	               [31, 32, 33, 34]])
    array2 = array1 * 2
    print(np.concatenate((array1,array2),axis=1))
    print(np.concatenate((array1,array2),axis=0))
    
    [Running]===========
    [[11 12 13 14 22 24 26 28]
     [21 22 23 24 42 44 46 48]
     [31 32 33 34 62 64 66 68]]
    [[11 12 13 14]
     [21 22 23 24]
     [31 32 33 34]
     [22 24 26 28]
     [42 44 46 48]
     [62 64 66 68]]
6.2.4 dstack()

dstack() is depth superposition, that is, adding a dimension, such as the array of (2, 3, 2) after depth superposition of two (2, 3)

    import numpy as np
    
    array1 = np.array([[11, 12, 13, 14],
    	               [21, 22, 23, 24],
    	               [31, 32, 33, 34]])
    array2 = array1 * 2
    print(np.dstack((array1,array2)))
    print(np.dstack((array1,array2)).shape)
    
    [Running]=================
    [[[11 22]
      [12 24]
      [13 26]
      [14 28]]
    
     [[21 42]
      [22 44]
      [23 46]
      [24 48]]
    
     [[31 62]
      [32 64]
      [33 66]
      [34 68]]]
    (3, 4, 2)

6.3 array splitting

Array splitting is similar to stacking, including hsplit(), vsplit(), dsplit()
And split() function. The first three correspond to horizontal split, vertical split and depth split respectively. When axis=1 is entered in split, it will split along the horizontal direction; When axis=0, split along the vertical direction.

7. Array type conversion

7.1 tolist()

tolist() can convert an array into a list

    import numpy as np
    
    array1 = np.array([[11, 12, 13, 14],
    	               [21, 22, 23, 24],
    	               [31, 32, 33, 34]])
    
    print(array1.tolist())
    
    [Running]================
    [[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34]]

7.2 astype()

astype() can convert an array to a specified type

    import numpy as np
    
    array1 = np.array([[11, 12, 13, 14],
    	               [21, 22, 23, 24],
    	               [31, 32, 33, 34]])
    
    print(array1.astype(float))
    
    [Running]=================
    [[11. 12. 13. 14.]
     [21. 22. 23. 24.]
     [31. 32. 33. 34.]]

8. numpy common statistical functions

Please note that when using the function, you need to specify the axis direction. If not specified, the whole array will be counted by default.

  • np.sum(), return sum
  • np.mean() returns the mean value
  • np.max() returns the maximum value
  • np.min(), return the minimum value
  • np.ptp(), the array returns the maximum value minus the minimum value along the specified axis, i.e. (max min)
  • np.std(), return standard deviation
  • np.var(), return variance
  • np.cumsum(), return accumulated value
  • np.cumprod(), return cumulative product value

9. Broadcast of array

When an array performs a mathematical operation with a scalar, the scalar needs to be extended according to the shape of the array, and then the operation is performed. This extension process is called "broadcasting"
As we used earlier, multiplication

    import numpy as np
    
    array1 = np.array([[11, 12, 13, 14],
    	               [21, 22, 23, 24],
    	               [31, 32, 33, 34]])
    array2 = array1 + 5
    array3 = array1 * 2
    print(array2)
    print(array3)
    
    [Running]=============
    [[16 17 18 19]
     [26 27 28 29]
     [36 37 38 39]]
    [[22 24 26 28]
     [42 44 46 48]
     [62 64 66 68]]

Keywords: Python Data Analysis

Added by DrJonesAC2 on Sat, 22 Jan 2022 21:36:08 +0200