Chapter 5 error back propagation

The fourth chapter introduces the implementation of a two-layer neural network, in which the method of obtaining the gradient is numerical differentiation. This method is relatively simple, but the speed is slow. This will greatly affect the performance of neural network. This paper introduces a faster method to obtain the gradient, which is the error Back propagation The method is illustrated by using the calculation diagram.

In order to understand the principle of error back propagation method, it can be based on mathematical formula or calculation diagram. This book adopts calculation diagram, which is easy to understand

5.1 calculation diagram

5.1.1 solve with calculation diagram

First, two examples are given:
Question 1: Xiao Ming bought two apples of 100 yen each in the supermarket, and the consumption tax is 10%. Calculate the amount Xiao Ming paid.

Figure 1: solve problem 1 based on the calculation diagram: the number of apples and consumption tax are marked externally as variables

Question 2: Xiao Ming bought two 100 yen apples and three 150 yen oranges in the supermarket. The consumption tax is 10%. Calculate the amount Xiao Ming paid.

Figure 2: solve problem 2 based on the calculation diagram: the number of apples and oranges and consumption tax are marked externally as variables

Based on the above solution, the calculation flow of the calculation diagram can be obtained:

Build calculation diagram
On the calculation chart, calculate from left to right

Forward propagation: calculation from left to right is called forward propagation
Back propagation: it is also possible to calculate from right to left, which is called back propagation and plays an important role in calculating derivatives

5.1.2 local calculation

For example, if you buy two apples and other things in the supermarket, you can draw the following calculation diagram:

Figure 3: bought two apples and other things

It can be seen from the above figure that the calculation diagram can focus on local calculation. No matter how complex the global calculation is, all the steps need to do is the local calculation of the object node.

5.1.3 why use calculation chart to solve problems

Through the calculation diagram, the gradient can be calculated efficiently using back propagation.
Example: in solving problem 1, the derivative of the price of the total amount paid.

Figure 4: I bought two apples, and the derivative of consumption amount to apple price

According to the above figure, the derivative of consumption amount to apple price is 2.2, that is, for every unit of apple price change, the consumption amount will increase by 2.2 units. During the calculation of apple price derivative, other derivatives can also be solved, and the derivative can be shared, such as the gradient between the total amount of final consumption and the total amount excluding consumption tax.

5.2 chain rule

Chain rule: the core of back propagation.

5.2.1 back propagation of calculation diagram

Figure 5: back propagation of calculation diagram

5.2.2 what is the chain rule

The chain rule is about the properties of the derivatives of composite functions:

If a function is represented by a composite function, the derivative of the composite function can be expressed by the product of the derivatives of each function constituting the composite function.

Figure 6: steps of derivation of composite function

5.2.3 chain rule and calculation diagram

Figure 7: calculation diagram to solve the gradient of composite function

Figure 8: calculation diagram to solve the gradient of composite function (bring in specific data)

Through the above two figures, we can understand the process of calculating the gradient information through the chain rule.

5.3 back propagation (based on calculation diagram)

Back propagation is based on the chain rule. This section will introduce the structure of back propagation by taking operations such as + and x as examples

5.3.1 back propagation of addition node

Take z = x + y, z = x + YZ = x + y as an example.

Figure 9: the back propagation of the addition node is shown in the figure above: the graph of coordinates is the forward propagation, and the right is the back propagation. For the back propagation of the addition node, the downstream gradient value is equal to the upstream gradient value.

Back propagation of local calculation addition node:

Figure 10: backpropagation of locally calculated addition nodes for a large network, the backpropagation result of an addition node at a certain place is still valid.

A practical example: 10 + 5 = 15, 10 + 5 = 1510 + 5 = 15. During back propagation, the gradient value transmitted from the upstream is 1.3

Figure 11: example of backpropagation of locally calculated addition nodes

5.3.2 back propagation of multiplication node

Take z = x * y, z = x * YZ = x * y as an example.

Figure 12: back propagation of multiplication node is shown in the figure above: the figure on the left is forward propagation and the figure on the right is back propagation

Practical examples:

Figure 13: the gradient value from upstream is 1.3

The back propagation of multiplication will be multiplied by the inversion value of the input signal, that is, the derivative of 10 should be 1.35 = 6.5; The derivative of 5 should be 1.310 = 13;
When realizing the back propagation of multiplication node, the input signal of forward propagation should be saved

Examples of hand training:
After understanding the back propagation of addition node and multiplication node, you can try the following questions and fill in the results:

Figure 14: inspection problems

5.4 code implementation of back propagation (based on calculation diagram)

It can be seen from the above that there are two cases to solve the gradient through the calculation graph, multiplication node and addition node. Here, we define two layers: multiplication layer and addition layer to realize the derivation of the problem in these two cases. Many problems are the combination of addition layer and multiplication layer.

5.4.1 implementation of multiplication layer

class MulLayer:
    def __init__(self):
        self.x=None
        self.y=None
        
    #Forward propagation
    def forward(self,x,y):
        self.x=x
        self.y=y
        out=x *y
        
        return out
    def backward(self,dout):#dout is the derivative passed down from the upper layer
        #Flip x and y
        dx=dout * self.y
        dy=dout * self.x
        
        return dx,dy

5.4.2 implementation of addition layer

class Addlayer:
    def __init__(self):
       pass  #No need to pass parameters
    
    #Forward forward
    def forward(self,x,y):
        out=x+y
        return out
    
    #backward
    def backward(self,out):
        dx=dout
        dy=dout
        return dx,dy

5.4.3 solution to problem 1

Repeat question 1: * Xiao Ming bought two apples of 100 yen each in the supermarket, and the consumption tax is 10%. Calculate the amount Xiao Ming paid, calculate the unit price of the payment amount to the apple, the number of apples, and the gradient of the tax rate?

#Two of Apple's back propagation errors:
#Implementation of deep learning simple layer (multiplication layer and addition layer)
class MulLayer:
    def __init__(self):
        self.x=None
        self.y=None
        
    #Forward propagation
    def forward(self,x,y):
        self.x=x
        self.y=y
        out=x *y
        
        return out
    def backward(self,dout):#dout is the derivative passed down from the upper layer
        #Flip x and y
        dx=dout * self.y
        dy=dout * self.x
        
        return dx,dy
class Addlayer:
    def __init__(self):
       pass  #No need to pass parameters
    
    #Forward forward
    def forward(self,x,y):
        out=x+y
        return out
    
    #backward
    def backward(self,out):
        dx=dout
        dy=dout
        return dx,dy
    
#Define initial value
apple=100
appel_num=2
tax=1.1

#Definition layer
mul_apple_layer=MulLayer()
mul_tax_layer=MulLayer()

#forward
apple_price=mul_apple_layer.forward(apple,appel_num)
price=mul_tax_layer.forward(apple_price,tax)
print('price=','%.1f'%price)

# backward
dprice = 1
dapple_price, dtax = mul_tax_layer.backward(dprice)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)

#Output results
print('dapple=',dapple)
print('dapple_num=',int(dapple_num))
print('dtax=',dtax)

5.4.3 solution to problem 2

Repeat question 2: * Xiao Ming bought two apples of 100 yen each and three oranges of 150 yen each in the supermarket. The consumption tax is 10%. Calculate the amount Xiao Ming paid, the unit price of the payment amount to the apple, the quantity of apples, the unit price of oranges, the quantity of oranges, and the gradient of tax rate?

#Error back propagation for purchasing two apples and three oranges (including error back propagation for addition and multiplication):
#Implementation of deep learning simple layer (multiplication layer and addition layer)
class MulLayer:
    def __init__(self):
        self.x=None
        self.y=None
        
    #Forward propagation
    def forward(self,x,y):
        self.x=x
        self.y=y
        out=x *y
        
        return out
    def backward(self,dout):#dout is the derivative passed down from the upper layer
        #Flip x and y
        dx=dout * self.y
        dy=dout * self.x
        
        return dx,dy
class AddLayer:
    def __init__(self):
       pass  #No need to pass parameters
    
    #Forward forward
    def forward(self,x,y):
        out=x+y
        return out
    
    #backward
    def backward(self,dout):
        dx=dout
        dy=dout
        return dx,dy
    
#Define initial value
apple=100
orange=150
appel_num=2
orange_num=3
tax=1.1

#Definition layer
mul_apple_layer=MulLayer()
mul_orange_layer=MulLayer()
mul_tax_layer=MulLayer()
add_orange_apple_layer=AddLayer()

#forward
apple_price=mul_apple_layer.forward(apple,appel_num)
orange_price=mul_orange_layer.forward(orange,orange_num)
sum_price=add_orange_apple_layer.forward(apple_price,orange_price)
price=mul_tax_layer.forward(sum_price,tax)
print('sumprice=','%.1f'%price)

# backward
dprice = 1
dsum_price, dtax = mul_tax_layer.backward(dprice)
dapple_price,dorange_price=add_orange_apple_layer.backward(dsum_price)
dorange,dorange_num=mul_orange_layer.backward(dorange_price)
dapple,dapple_num=mul_apple_layer.backward(dapple_price)

#Output results
print('dapple=','%.1f'%dapple,'dapple_num=',int(dapple_num))
print('dorange=','%.1f'%dorange,'dorange_num=',int(dorange_num))

print('dtax=',dtax,'dapple_price=',dapple_price)

5.5 implementation of activation function layer (based on calculation diagram)

Applying the thinking of computational graph to neural network, we define the implementation of neural network layer as a class. The activation function layer is realized through addition layer and multiplication layer. Activation functions include: ReLU layer and Sigmoid layer

5.5.1 ReLU layer

Figure 15: function expression and gradient expression of activation function ReLUFigure 16: calculation diagram of ReLU layer

Implementation code:

class ReLU:
    def __init__(self):
        self.mask=None
     
    #Forward propagation
    def forward(self,x):
        self.mask=(x<=0)  #mask a logical array in which the values of x < = are True
        out=x.copy()      #out equals x
        out[self.mask]=0  #Set the value of True in the mask to 0
                          #It is equivalent to completing the function of ReLU, which is 0 when it is less than or equal to 0; When it is greater than 0, keep the original value
        return out
    #Back propagation
    def backward(self,dout):
        dout[mask]=0
        df=dout
        return df

5.5.2 Sigmoid layer

According to the definition, the function expression of Sigmoid is:

Figure 17: calculation diagram of activation function SigmoidFigure 18: simplified calculation diagram of activation function Sigmoid

Code implementation:

class Sigmoid():
    def __init__(self):
        self.out=None
        
    def forward(self,x):
        out=1/(1+np.exp(-x))
        self.out=out
        return out
        
    def backward(self,dout):
        return dout*self.out*(1.0-self.out)

5.5.3 Affine layer

In the forward propagation of neural network, it is necessary to calculate the sum of weighted signals. Here, the Affine layer is used to define this operation.

Function expression of affinity layer (X, W and B are matrices): y = X * W+B y = XW + by * = x * W+B

Figure 19: calculation diagram of affinity layer of batch version

Code implementation:

class Affine():                     #Change all the important parameters into instance variables
    def __init__(self,W,b):
        self.W=W
        self.b=b
        self.x=None
        self.dW=None
        self.db=None
    
    def forward(self,x):
        self.x=x
        out=self.x*self.W+b
        self.out=out 
        return out
    
    def backward(self,dout):
        dx=np.dot(dout,self.W.T)
        dW=np.dot(self.x.T,dout)
        db=np.sum(dout,axis=0)
        
        return dx

5.5.4 softmax with loss layer

If the neural network is used for classification, the last layer is the Softmax layer. Due to the need for training data, the last layer in the training network also needs a loss function layer to evaluate the advantages and disadvantages of the selected parameters.

Figure 20: calculation diagram of softmax with loss layer

Figure 21: simplified version - calculation diagram of softmax with loss layer

In Figure 21, the softmax layer and cross entry error layer are encapsulated during forward propagation, which can be seen more clearly.

Code implementation:

class SoftmaxWithLoss():
    def __init__(slef):
        self.loss=None   #loss
        self.y=None      #Output of softmax
        self.t=None      #one hot vector
        
    def forward(self,x,t):
        self.t=t
        self.y=softmax(x)
        self.loss=cross_entropy_error(self.y,self.t)  #Use the softmax and cross defined above_ entropy_ Error calculate the error value
        
        return self.loss
    
    def backward(self,dout=1):
        batch.size=self.t.shape[0]
        dx=(self.y-self.t)/batch_size   #The reason why the gradient is divided by the batch: the previous error is a batch
                                        #      The sum of the mean square error, divided by batch -- size, is
                                        #      What is passed to the previous layer is the error of a single data
                                        #      The gradient formula here is derived
        return dx

5.6 error Back propagation (code implementation based on calculation diagram)

There are two ways to realize the code in this chapter:

The first one: directly modify the sub function of solving the gradient, and other parts are the same as in the previous chapter.
The second is to modify the class of two-layer neural network and divide the neural network into layers.

Both methods have advantages and disadvantages:

In the first method, the gradient function is modified directly. The operation is relatively simple, but the content is relatively complex. If the number of layers of the network is deepened, it will be difficult to write the corresponding gradient function.
The second method has many changes, but the hierarchical structure is simple, which is very suitable for expanding to a deeper network structure.

5.6.1 implementation of method 1

This method only changes the sub function of solving the gradient. Here, a two-layer network is taken as an example to list the sub functions before and after modification.
(before modification) gradient calculation by numerical differentiation method:

 def numerical_gradient(self,x,t):
        loss_W=lambda W:self.loss(x,t)      #I don't understand. What do you mean
        
        grads={}  #Define the gradient information of parameters and access the gradient information value of weight parameters
        
        #The gradient information values of the four weight parameters are obtained and stored in grad
        grads['W1']=numerical_gradient(loss_W,self.params['W1'])
        grads['b1']=numerical_gradient(loss_W,self.params['b1'])
        grads['W2']=numerical_gradient(loss_W,self.params['W2'])
        grads['b2']=numerical_gradient(loss_W,self.params['b2'])
        
        return grads

(modified) error reverse broadcast gradient:

def gradient(self, x, t):
        W1, W2 = self.params['W1'], self.params['W2']
        b1, b2 = self.params['b1'], self.params['b2']
        
        grads = {}
        
        batch_num = x.shape[0]
        
        # forward
        a1 = np.dot(x, W1) + b1
        z1 = sigmoid(a1)
        a2 = np.dot(z1, W2) + b2
        y = softmax(a2)              #Define by yourself, you can stack multiple layers
        
        # backward
        dy = (y - t) / batch_num
        grads['W2'] = np.dot(z1.T, dy)
        grads['b2'] = np.sum(dy, axis=0)
        
        da1 = np.dot(dy, W2.T)
        dz1 = sigmoid_grad(a1) * da1
        grads['W1'] = np.dot(x.T, dz1)
        grads['b1'] = np.sum(dz1, axis=0)

        return grads

5.6.2 implementation of method 2

For two layers neural network Has been greatly modified. Many types of activation function layers are defined. The back prediction is expressed by forward propagation, and the gradient is expressed by error back propagation results. The direct construction of layers is clear at a glance, which is convenient for the amplification of the back structure.

For the last blog (numerical differentiation method) minist neural network learning The code for realizing neural network in. There are 5 modifications here:

To write an additional file layer for calling the activation function layer to facilitate the call of neural network classes.
In__ init__ () subfunction defines the layers used in the network
In the sub function predict(), the forward propagation value is directly obtained for each layer at a time, instead of writing the results one by one.
In the loss() subfunction, the loss function no longer uses cross_ entropy_ The error () function is obtained, but it is directly solved with the SoftmaxWithLoss layer.
The last one is to solve the change of gradient function, which changes greatly here.

5.6.3 implementation code of method 2

There are three main files to implement this file:

TLN_main.py: the main function is the same as that of the previous blog, with no change
TLN_function.py: store some activation functions without change
Layer.py: a new file is added to store the functions of the active function layer (based on the calculation diagram)
two_layer_net1.py: there are many changes. The changes have just been described (they are also marked in the following code)

5.6.3.1 document I: TLN_main.py

File 1: TLN_main.py

'''#Implementation of ini batch algorithm ''
#Call of related Library
import time
import numpy as np
import matplotlib.pyplot as plt
import sys,os
sys.path.append(os.pardir)   #The first two lines of code are purely to call functions in documents across folders
from dataset.mnist import load_mnist        #Call the function that loads the mnist dataset
from two_layer_net1 import TwoLayerNet       #Call the class composed of edited two-layer neural network
#from two_layer_net import TwoLayerNet


#Data acquisition of handwritten dataset mnist
(x_train,t_train),(x_test,t_test)=load_mnist(flatten=True,normalize=True,one_hot_label=True) 
#load_mnist reads the data set in the form of (training picture, training label), (test picture, test label)
#Normalize: whether to normalize the input picture to a value of 0.0-1.0. There is no normalization here, and the picture pixels are still 0-255
#flatten: whether to expand the input image (into a one-dimensional array). Here, it is displayed in a one-dimensional array
#one_hot_label: only arrays with correct labels of 1 and other 0, such as [0,0,0,1,0,0]. If False, only labels with correct solutions such as 2,7 will be saved

 

#Average number of repetitions per epoch


#Definition of super parameters
iters_num=10000
train_size=x_train.shape[0]   #Size of total training set
batch_size=100                #mini_batch size
learning_rate=0.5                #The learning rate is also equivalent to the step size

network= TwoLayerNet(input_size=784,hidden_size=100,output_size=10)
#Define the basic parameters of the two-layer neural network:
#Neural network: two layers, 784 inputs, 50 hidden neurons and 10 outputs.

#Define some matrices to calculate the loss function and accuracy value
train_loss_list=[]  #Training loss function
train_acc_list=[]   #Accuracy of training data
test_acc_list=[]    #Accuracy of test data
iter_per_epoch=max(train_size/batch_size,1)


#Main function
start = time.clock()    #Timing start

for i in range(iters_num):
        
    #Get random mini batch
    batch_mask=np.random.choice(train_size,batch_size)  #Select batch from the total number of training sets_ Size is a random number without repetition
    x_batch=x_train[batch_mask]
    t_batch=t_train[batch_mask]

    #Calculated gradient
    #grad=network.numerical_gradient(x_batch,t_batch)
    #Using the error back propagation method:
    grad=network.gradient(x_batch,t_batch)
    
    #Update parameters
    for key in ('W1','b1','W2','b2'):
        network.params[key] -= learning_rate*grad[key]
        
    #Record the learning process
    loss=network.loss(x_batch,t_batch)
    train_loss_list.append(loss)         #Do not understand the meaning of the code


    #After every epoch (parameter update), the recognition accuracy is calculated for all training data and test data
    if i % iter_per_epoch==0:
        #Record of training data and test data
        train_acc=network.accuracy(x_train,t_train)
        test_acc=network.accuracy(x_test,t_test)
        
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        
       # print('train_acc,test_acc| '+str(train_acc)+' , '+test_acc)
        print('train_acc,test_acc|', train_acc,',',test_acc)
        
#Add a timing function
end = time.clock()           #End of timing
print ('Running time:',str(end-start))   #Displays the elapsed time
        
#Draw graphics

x = np.arange(len(train_acc_list))                            #Draw the three main lines of the image, variables and dependent variables
plt.plot(x, train_acc_list, label='train acc', marker='o')
plt.plot(x, test_acc_list, label='test acc', marker='x',linestyle='--')

plt.xlabel("epochs")            #A label that displays the abscissa and ordinate
plt.ylabel("accuracy")
plt.ylim(0, 1.0)
plt.legend(loc='lower right')  #The legend is displayed in the lower right corner

plt.savefig('./test2.jpg')     #Save the displayed picture
plt.show()

5.6.3.2 document II: TLN_function.py

Document 2: TLN_function.py

#Subfunctions in TwoLayerNet
import numpy as np
#Find gradient function
def numerical_gradient(f, x):
    h = 1e-4 # 0.0001
    grad = np.zeros_like(x)
    
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        idx = it.multi_index
        tmp_val = x[idx]
        x[idx] = float(tmp_val) + h
        fxh1 = f(x) # f(x+h)
        
        x[idx] = tmp_val - h 
        fxh2 = f(x) # f(x-h)
        grad[idx] = (fxh1 - fxh2) / (2*h)
        
        x[idx] = tmp_val # Restore value
        it.iternext()   
        
    return grad

#Activation function (used between layers)
def sigmoid(x):
    return 1 / (1 + np.exp(-x))  

#Activation function (used by output layer)
def softmax(x):
    if x.ndim == 2:
        x = x.T
        x = x - np.max(x, axis=0)
        y = np.exp(x) / np.sum(np.exp(x), axis=0)
        return y.T 

    x = x - np.max(x) # Spillover Countermeasures
    return np.exp(x) / np.sum(np.exp(x))

#Calculation of loss function
def cross_entropy_error(y, t):
    if y.ndim == 1:
        t = t.reshape(1, t.size)
        y = y.reshape(1, y.size)
        
    # When the supervision data is one hot vector, it is converted to the index of correctly unlabeled
    if t.size == y.size:
        t = t.argmax(axis=1)
             
    batch_size = y.shape[0]
    return -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size

#Subfunction to be used in error back propagation algorithm
def sigmoid_grad(x):
    return (1.0 - sigmoid(x)) * sigmoid(x)

5.6.3.3 document III: layer py

File 3: layer py

#Implementation of activation layer, four (ReLU, Sigmoid, affiliate, softmaxwithloss)

import numpy as np
from TLN_function import *


class ReLU:
    def __init__(self):
        self.mask=None
     
    #Forward propagation
    def forward(self,x):
        self.mask=(x<=0)  #mask a logical array in which the values of x < = are True
        out=x.copy()      #out equals x
        out[self.mask]=0  #Set the value of True in the mask to 0
                          #It is equivalent to completing the function of ReLU, which is 0 when it is less than or equal to 0; When it is greater than 0, keep the original value
        return out
    #Back propagation
    def backward(self,dout):
        dout[self.mask]=0
        df=dout
        return df
  
class Sigmoid():
    def __init__(self):
        self.out=None
        
    def forward(self,x):
        out=1 / (1 + np.exp(-x)) 
        self.out=out
        return out
        
    def backward(self,dout):
        return dout*(1.0-self.out)*self.out

class Affine():                     #Change all the important parameters into instance variables
    def __init__(self,W,b):
        self.W=W
        self.b=b
        self.x=None
        self.original_x_shapr=None             ####Define the shape of x
        self.dW=None
        self.db=None
    
    def forward(self,x):
        self.original_x_shape =x.shape        #####Save the shape of x  
        x=x.reshape(x.shape[0],-1)            #####Restore the shape of x
        self.x=x
        out=np.dot(x,self.W)+self.b
    
        return out
    
    def backward(self,dout):
        dx=np.dot(dout,self.W.T)
        self.dW=np.dot(self.x.T,dout)
        self.db=np.sum(dout,axis=0)
        
        dx=dx.reshape(*self.original_x_shape)   #Restore the shape of the input data (corresponding tensor)
        
        return dx

class SoftmaxWithLoss():
    def __init__(self):
        self.loss=None   #loss
        self.y=None      #Output of softmax
        self.t=None      #one hot vector
        
    def forward(self,x,t):
        self.t=t
        self.y=softmax(x)
        self.loss=cross_entropy_error(self.y,self.t)  #Use the softmax and cross defined above_ entropy_ Error calculate the error value
        
        return self.loss
    
    def backward(self,dout=1):
        batch_size=self.t.shape[0]
        if self.t.size ==self.y.size:    ####The supervision data is one-hot-vector case (a judgment is added to determine whether it is one-hot-vector case)
            dx=(self.y-self.t)/batch_size   #The reason why the gradient is divided by the batch: the previous error is a batch
                                        #      The sum of the mean square error, divided by batch -- size, is
                                        #      What is passed to the previous layer is the error of a single data
                                        #      The gradient formula here is derived
        else:                                      ###
            dx=self.y.copy()                        ###
            dx[np.arange(batch_size),self.t] -= 1
            dx= dx/batch_size
        return dx

5.6.3.4 document IV: two_layer_net1.py

Document 4: two_layer_net1.py

#Two layer network structure using error back propagation

#Import of libraries and functions

import sys,os
import numpy as np
sys.path.append(os.pardir)          #The first two lines of code are purely to call functions in documents across folders
from TLN_function import *          #Call TLN_ All sub functions in function file
from Layer import *                 #call Layer All subfunctions in the file                 newly added#######
from collections import OrderedDict  #Calling from the internal library OrderdDict This ordered dictionary      newly added#######


class TwoLayerNet:
    
    
    def __init__(self,input_size,hidden_size,output_size,weight_init_std=0.01):
        
        #Initialize the weight and define several instance variables (that is, local variables in the class)
        self.params={}   #Initialize the instance variable params, which contains four variables: W1, W2, B1 and B2
        
        #W1 and W2 are initialized with random numbers conforming to Gaussian distribution
        self.params['W1']=weight_init_std*np.random.randn(input_size,hidden_size)  
        self.params['W2']=weight_init_std*np.random.randn(hidden_size, output_size)
        #b1 and b2 are initialized with 0
        self.params['b1']=np.zeros(hidden_size)    #The initial values are all set to 0
        self.params['b2']=np.zeros(output_size)    #The initial values are all set to 0
    
        #Generation layer                                                               newly added#######
        self.layers=OrderedDict()
        
        self.layers['Affinel'] = Affine(self.params['W1'],self.params['b1'])
        self.layers['Relu1'] = ReLU()
        self.layers['Affine2'] = Affine(self.params['W2'],self.params['b2'])
        
        self.lastLayer = SoftmaxWithLoss()
        
        
    def predict(self,x):                                                     #modify#######
        for layer in self.layers.values():       #The value function operates on the dictionary variable to obtain the value of the dictionary variable
            x=layer.forward(x)
            
        return x
    '''
    def predict(self,x):
        
        #Assignment variable
        W1,W2=self.params['W1'],self.params['W2']
        b1,b2=self.params['b1'],self.params['b2']
        
        #Forward propagation algorithm for two-layer networks
        a1=np.dot(x,W1)+b1
        z1=sigmoid(a1)
        a2=np.dot(z1,W2)+b2
        y=softmax(a2)
        
        return y 
    '''
    
    #x: Image data; t: Untag correctly        
    #Loss function (cross entropy error function u, calculate the loss value)
    def loss(self,x,t):                                                       #modify
        y=self.predict(x)
        #loss_x=cross_entropy_error(y,t)                                     #Modification, two things mean the same thing
        loss_x=self.lastLayer.forward(y,t)
        
        return loss_x
    
    #Calculation accuracy function
    def accuracy(self,x,t):
        y=self.predict(x)
        y=np.argmax(y,axis=1)   #Find the maximum value in each column in y
        t=np.argmax(t,axis=1)   #Find the maximum value in each column in t
        
        accuracy=np.sum(y==t)/float(x.shape[0])   #Cast type
        return accuracy
    
    #x: Image data; t: Untag correctly
    #Calculate the gradient of the weight parameter
    
    
    #Two algorithms for solving gradient
    #Solving the gradient value of weight parameters by negative gradient method 
    def numerical_gradient(self,x,t):
        loss_W=lambda W:self.loss(x,t)      #I don't understand. What do you mean
        
        grads={}  #Define the gradient information of parameters and access the gradient information value of weight parameters
        
        #The gradient information values of the four weight parameters are obtained and stored in grad
        grads['W1']=numerical_gradient(loss_W,self.params['W1'])
        grads['b1']=numerical_gradient(loss_W,self.params['b1'])
        grads['W2']=numerical_gradient(loss_W,self.params['W2'])
        grads['b2']=numerical_gradient(loss_W,self.params['b2'])
        
        return grads
    
    #Solving gradient value by error back propagation method
    def gradient(self, x, t):                                              #Major modifications
        
        #forward
        self.loss(x,t)
        
        #backward
        dout=1
        dout=self.lastLayer.backward(dout)
        
        layers=list(self.layers.values())
        layers.reverse()                                #Reverse this dictionary vector
        for layer in layers:
            dout=layer.backward(dout)
       
        #set up
        grads={}  #Define the gradient information of parameters and access the gradient information value of weight parameters
        
        #The gradient information values of the four weight parameters are obtained and stored in grad
        #The updated values are stored in a matrix
        grads['W1']=self.layers['Affinel'].dW
        grads['b1']=self.layers['Affinel'].db
        grads['W2']=self.layers['Affine2'].dW
        grads['b2']=self.layers['Affine2'].db
        
        return grads

5.6.3.5. Operation results

>>>start = time.clock()    #Timing start
>>>train_acc,test_acc| 0.16121666666666667 , 0.1615
>>>train_acc,test_acc| 0.9480833333333333 , 0.9461
>>>train_acc,test_acc| 0.9716 , 0.9669
>>>train_acc,test_acc| 0.9767333333333333 , 0.972
>>>,test_acc| 0.9813333333333333 , 0.9727
>>>train_acc,test_acc| 0.9871333333333333 , 0.973
>>>train_acc,test_acc| 0.9903833333333333 , 0.9748
>>>train_acc,test_acc| 0.9923666666666666 , 0.9782
>>>,test_acc| 0.9934833333333334 , 0.9767
>>>train_acc,test_acc| 0.99525 , 0.9784
>>>train_acc,test_acc| 0.9958666666666667 , 0.978
>>>train_acc,test_acc| 0.9983833333333333 , 0.9794
>>>Running time: 44.66675650000002   #Total time

Picture output:

Figure 1: output pictures (change of accuracy rate) the accuracy of test set and training set changes little, indicating that the training model has not been fitted and the effect is relatively good.

'Affine2'].dW
grads['b2']=self.layers['Affine2'].db

    return grads

##### 5.6.3.5.  Operation results

```python
>>>start = time.clock()    #Timing start
>>>train_acc,test_acc| 0.16121666666666667 , 0.1615
>>>train_acc,test_acc| 0.9480833333333333 , 0.9461
>>>train_acc,test_acc| 0.9716 , 0.9669
>>>train_acc,test_acc| 0.9767333333333333 , 0.972
>>>,test_acc| 0.9813333333333333 , 0.9727
>>>train_acc,test_acc| 0.9871333333333333 , 0.973
>>>train_acc,test_acc| 0.9903833333333333 , 0.9748
>>>train_acc,test_acc| 0.9923666666666666 , 0.9782
>>>,test_acc| 0.9934833333333334 , 0.9767
>>>train_acc,test_acc| 0.99525 , 0.9784
>>>train_acc,test_acc| 0.9958666666666667 , 0.978
>>>train_acc,test_acc| 0.9983833333333333 , 0.9794
>>>Running time: 44.66675650000002   #Total time

Picture output:

Figure 1: output pictures (change of accuracy rate) the accuracy of test set and training set changes little, indicating that the training model has not been fitted and the effect is relatively good.

Keywords: Machine Learning neural networks Deep Learning

Added by carrotcake1029 on Wed, 02 Feb 2022 02:43:14 +0200

Programming VIP

Chapter 5 error back propagation

5.1 calculation diagram

5.1.1 solve with calculation diagram

5.1.2 local calculation

5.1.3 why use calculation chart to solve problems

5.2 chain rule

5.2.1 back propagation of calculation diagram

5.2.2 what is the chain rule

5.2.3 chain rule and calculation diagram

5.3 back propagation (based on calculation diagram)

5.3.1 back propagation of addition node

5.3.2 back propagation of multiplication node

5.4 code implementation of back propagation (based on calculation diagram)

5.4.1 implementation of multiplication layer

5.4.2 implementation of addition layer

5.4.3 solution to problem 1

5.4.3 solution to problem 2

5.5 implementation of activation function layer (based on calculation diagram)

5.5.1 ReLU layer

5.5.2 Sigmoid layer

5.5.3 Affine layer

5.5.4 softmax with loss layer

5.6 error Back propagation (code implementation based on calculation diagram)

5.6.1 implementation of method 1

5.6.2 implementation of method 2

5.6.3 implementation code of method 2

5.6.3.1 document I: TLN_main.py

5.6.3.2 document II: TLN_function.py

5.6.3.3 document III: layer py

5.6.3.4 document IV: two_layer_net1.py

5.6.3.5. Operation results

Popular Keywords