NumPy - Statistical Function

Introducing module import numpy as np

1.numpy.sum(a, axis=None)/a.sum(axis=None)

Calculates the sum, axis integer, or tuple, of the elements associated with array a based on the given axis axis. The default is to sum all elements without specifying an axis.

If the shape of a is (d0,d1,..,dn), when axis=(m1,m2,...mi), the result returned should be a shape of (d0,d1,...,dn)-(dm1,dm2,...dmi), where each element is the sum of the elements above the axis m1,m2,...mi.


a = np.arange(24).reshape((2, 3, 4))
print("array a:\n", a)
print("np.sum(a):", np.sum(a))                   # All elements and
print("np.sum(a, axis=0):\n", np.sum(a, axis=0))   # Elements and of axis 0 (outermost)
print("np.sum(a, axis=1):\n", np.sum(a, axis=1))    # Axis 1 elements and
print("np.sum(a, axis=(0, 1)):\n", np.sum(a, axis=(0, 1)))  # The sum of axis 0 and axis 1 elements
print("np.sum(a, axis=(0, 2)):\n", np.sum(a, axis=(0, 2)))  # The sum of axis 0 and axis 2 elements


array a:
 [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
np.sum(a): 276

np.sum(a, axis=0):
 [[12 14 16 18]         # 0+12=12 1+13=14 ...
 [20 22 24 26]          # 4+16=20 5+17=22
 [28 30 32 34]]
np.sum(a, axis=1):
 [[12 15 18 21]         # 0+4+8=12 1+5+9=15 ...
 [48 51 54 57]]         # 12+16+20=48 13+17+21=51
 np.sum(a, axis=(0, 1)):
 [60 66 72 78]          # 0+4+8+12+16+20=60 1+5+9+13+17+21=66...
np.sum(a, axis=(0, 2)):
 [ 60  92 124]          # 0+1+2+3+12+13+14+15=60 4+5+6+7+16+17+18+19=92....

2.numpy.mean(a, axis=None)/a.mean(axis=None)`

Calculates the average, axis integer, or tuple of the elements associated with array a based on the given axis axis.

Axis is not specified and all elements are averaged by default.Specify axis, and average the elements on the specified axis.

If the shape of a is (d0,d1,..,dn), when axis=(m1,m2,...mi), the result returned should be a shape of (d0,d1,...,dn)-(dm1,dm2,...dmi), where each element is the average of all elements on axis m1,m2,...mi


print("array a:\n", a)
print("np.mean(a):", np.mean(a))    # Average of all elements
print("np.mean(a, axis=0):\n", np.mean(a, axis=0))  # Average on 0 axis
print("np.mean(a, axis=(0, 2)):\n", np.mean(a, axis=(0, 2)))    # Average of 0 and 2 axes


array a:
 [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
np.mean(a): 11.5
np.mean(a, axis=0):
 [[ 6.  7.  8.  9.]         # (0+12)/2=6 (1+13)/2=7...
 [10. 11. 12. 13.]          # (4+16)/2=10 (5+17)/2=11...
 [14. 15. 16. 17.]]         # (8+20)/2=14 (9+21)/2=15..
np.mean(a, axis=(0, 2)):
 [ 7.5 11.5 15.5]           # (0+1+2+3+12+13+14+15)/2=7.5..


Calculates the weighted average of the elements associated with array a based on the given axis axis.

Weights are an array of weights whose shape should be the same as that of a given array, that is, weights.shape=a.shape, or when an axis axis is specified, weight should be a one-dimensional array with the same number of elements as the specified axis dimension.

When weigts is not specified, the mean is calculated, and the effect is the same as.mean


print("array a:\n", a)
print("np.average(a, axis=0):\n", np.average(a, axis=0))
print("np.average(a, axis=0, weights=[10, 1]):\n", np.average(a, axis=0, weights=[10, 1]))
wei = np.random.randint(1, 60, (2, 3, 4 ))
print("The weight array is:", wei)
print("np.average(a, axis=(0, 2), weights=wei):\n", np.average(a, axis=(0, 2), weights=wei))


array a:
 [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
np.average(a, axis=0):
 [[ 6.  7.  8.  9.]
 [10. 11. 12. 13.]
 [14. 15. 16. 17.]]
np.average(a, axis=0, weights=[10, 1]):
 [[ 1.09090909  2.09090909  3.09090909  4.09090909] # (0*10+12*1)/(10+1)=1.0909
 [ 5.09090909  6.09090909  7.09090909  8.09090909]  # (4*10+16*1)/(10+1)=5.0909
 [ 9.09090909 10.09090909 11.09090909 12.09090909]]
//The weight array is: [[[375 509]
  [ 9 40 17 42]
  [45  4 41 29]]

 [[17 24 29 37]
  [20  8 14 37]
  [ 3  1 48 14]]]
np.average(a, axis=(0, 2), weights=wei):
 [ 7.73557692 10.92513369 13.96756757]  # (0*37+1*5+2*50+3*9+12*17+13*24+14*29+15*37)/(37+5+50+9+17+24+29+37)=7.7355

4.numpy.std(a,axis=None)/a.std(axis=None)       numpy.var(a,axis=None)/a.var(axis=None)

.std(a,axis=None) Calculates the total standard deviation of the elements associated with array a based on the given axis axis (to be distinguished from the sample standard deviation)

That is: \(sigma=\sqrt{{\frac 1N}\sum_{i=1}^N(x_i-\overline x)^2}\)

(Standard Deviation) - std standard deviation, also known as mean deviation

.var(a,axis=None) Calculates the population variance of the elements associated with array a based on the given axis axis

That is: \(sigma^2={\frac {\sum_{i=1}^N(x_i-\overline x)^2}N})

Variance-var variance


b = np.random.randint(1, 30, (2, 3, 4))
print("array b:\n", b)
print("np.std(b, axis=2):\n", np.std(b, axis=2))    # standard deviation
print("np.var(b, axis=2):\n", np.var(b, axis=2))    # variance


array b:
 [[[16  8 27 24]
  [12 15 25  8]
  [11 19 15 26]]

 [[29 15 18 24]
  [17  8  4 15]
  [ 2 28 10 21]]]
np.std(b, axis=2):
 [[7.39509973 6.28490254 5.53962995]
 [5.40832691 5.24404424 9.98436277]]
np.var(b, axis=2):
 [[54.6875 39.5    30.6875]
 [29.25   27.5    99.6875]]

Let's examine, for example, the standard deviation of 12,15,258 sets of data on 2 axes:

The mean is: \(\overline x=15\)

The standard deviation of the sample is (sigma=sqrt{\frac {(12-15)^2+ (15-15)^2+ (25-15)^2+left(8-15right)^2}{4}=sqrt{39.5}approx6.284988)

The variance is: \(\sigma^2=39.5\)

5. Maximum Function


Returns the minimum value on the axis axis, or the minimum value of all elements by default if no axis is specified


Returns the maximum value on axis axis, or defaults to the maximum value of all elements if no axis is specified


c = np.random.randint(1, 60, (2, 3, 4))
print("array c:\n", c)
print("np.min(c):     ", np.min(c))
print("np.amin(c, axis=1):\n", np.amin(c, axis=1))
print("c.min(axis=2): \n", c.min(axis=2))
print("-"*20 + 'Split Line' + '-'*20)
print("np.max(c):    ", np.max(c))
print("np.amax(c, axis=1):\n", np.amax(c, axis=1))
print("c.max(axis=2):\n", c.max(axis=2))


array c:
 [[[15 50 24  6]
  [ 2  8 27 53]
  [52 23  9 35]]

 [[17 38 42 20]
  [ 4 32  9 17]
  [48 39 17 40]]]
np.min(c):      2
np.amin(c, axis=1):
 [[ 2  8  9  6]
 [ 4 32  9 17]]
 [[ 6  2  9]
 [17  4 17]]
--------------------Split Line--------------------
np.max(c):     53
np.amax(c, axis=1):
 [[52 50 27 53]
 [48 39 42 40]]
 [[50 53 52]
 [42 32 48]]

Strictly speaking, a.min and others are not functions of the NumPy Library

6. Maximum Subscript


Returns the relative coordinates of the minimum value on the specified axis of the array after it has been reduced to one dimension


Returns the relative coordinates of the maximum value on the specified axis of the array after it has been reduced to one dimension


print("array c:\n", c)
print("c.argmax():  ", c.argmax())
print("np.argmax(c, axis=2):\n", np.argmax(c, axis=2))
print("-"*20 + 'Split Line' + '-'*20)
print("np.argmin(c):  ", np.argmin(c))
print("c.argmin(axis=1):\n", c.argmin(axis=1))


array c:
 [[[50 44 13 16]
  [26 23 31 35]
  [ 5 21 42  8]]

 [[ 6 53 10 57]
  [14  5 18 38]
  [40 31  4 55]]]
c.argmax():   15        # One-dimensional reduction of subscript 57 is 15
np.argmax(c, axis=2):
 [[0 3 2]               # On axis 2, 50-0 35-3 42-2 57-3 38-3 55-3
 [3 3 3]]
--------------------Split Line--------------------
np.argmin(c):   22
 [[2 2 0 2]
 [0 1 2 1]]

7.numpy.unravel_index(index, shape)

Converts one-dimensional subscript index to a multidimensional subscript (corresponding to the shape's subscript) according to the shape, and works with argmax,argmin in 6


print("array c:\n", c)
print(np.unravel_index(np.argmax(c), c.shape))


 [[[22  4 28 56]
  [45 34  3 22]
  [59 43 43 27]]

 [[32 35 47 53]
  [ 7 27 41 18]
  [40 32 30 43]]]
(0, 2, 0) #59 is the maximum value of the array, and its index coordinates are (0, 2, 0)


Returns the median (median) of the array on the specified axis, or the median of all elements by default if the axis is not specified


print("array c:\n", c)
print("np.median(c):  ", np.median(c))


[[[17 59 14 23]
  [27 59  6 12]
  [43 16 27 17]]

 [[12 10  5 17]
  [21 55 18 42]
  [41 36 40  5]]]
np.median(c):   19.5

9. Other Functions


Calculates the difference between the maximum and minimum values on a specified axis, defaulting to all elements if no axis is specified


print("np.ptp(c):  ", np.ptp(c))
print("c.ptp(axis=1):\n", c.ptp(axis=1))


array c:
 [[[35 28 18 38]
  [44 56  7 24]
  [ 4 59  2 24]]

 [[55 56  5 27]
  [18 44 22  1]
  [ 3 30 20 43]]]
np.ptp(c):   58     #   59-1=58
 [[40 31 16 14]     # 44-4=40 59-28=31 ...
 [52 26 17 42]]

numpy.percentile(a, q, axis=None)

  1. a:Input Array
  2. q: Percentile to be calculated, between 0 and 100
  3. Axis: axis for calculating percentiles

Returns a number that satisfies that at least one q% of the number is less than or equal to the value and that at least one (100-q)% of the number is greater than or equal to the value.


d = np.random.randint(1, 40, (2, 5))
print("array d:\n", d)
print("np.percentile(d, 40):    ", np.percentile(d, 40))
print("np.percentile(d, 40, axis=1):\n", np.percentile(d, 40, axis=1))


array d:
 [[39 15 35 17 39]
 [20 12 36 19 10]]
np.percentile(d, 40):     18.200000000000003    
np.percentile(d, 40, axis=1):
 [27.8 16.2]              

Many function parameter lists have keepdims=False, keepdims is to maintain the dimensionality of the array, and if keepdims is True, the return will still be contained in the multidimensional array []

Reference material

Qike Valley--NumPy statistical function

Official Document - Statistical Functions

Official Documents - Sort, Search and Count

Keywords: Python less

Added by manitoon on Tue, 03 Mar 2020 18:12:50 +0200