About parameter acquisition: it has been proposed in the previous blog, please refer to For relevant links, please click
1, Analysis input and output
1. The handwriting input is a 28x28 black-and-white picture, so the input is 784 X
2. The output is the probability of identifying numbers 0-9, so there are 10 outputs
3. The input can only be decimals of - 1 ~ 1, mainly to prevent calculation overflow
2, Analysis of neural network layers
If there is only one layer, input 784 and output 10, there can only be 10 neurons that can be recorded in the middle. It is difficult to recognize the 10 numbers 0-9, so layer 2 is more appropriate
3, Analyze the number of neurons
There is no specific calculation method for the number of neurons in the hidden layer, which mainly depends on empirical testing. Of course, too many settings will lead to too many parameters, and it is difficult to train, and too few will not achieve the recognition effect
1. Layer 1: there is no absolute number of neurons in layer 1. Here is 64, which is mainly convenient for later fpga
2. Layer 2: because a neuron has only one output, and the number of outputs is a digital probability of 0-9, the number of neurons in layer 2 is 10
4, Number of analysis parameters
1. Layer 1: because the input layer 28x28 pictures, a neuron has 784 w, 1 b, and 64 neurons correspond to 64x784 w, 64 b
2. Layer 2: 64 neurons in layer 1 correspond to 64 outputs and 64 inputs in layer 2, so one neuron in layer 2 has 64 w, 1 b, and 64 neurons correspond to 64x10 w, 10 b
5, Analyze the connection mode of transmission process
1. Layer 1: single neuron calculation. Each pixel w0x0+w1x1... w783x783+b outputs a1 through the activation function, and then expands to all neurons in layer 1
2. The second layer: the first layer outputs a0,a1... A63. Single neuron calculation. Each pixel w0a0+w1a1... w63a63+b outputs a2 through the activation function, and then expands to all neurons in the second layer
Note: reasoning is generally to get the reasoning result, and we don't care about probability. Therefore, in order to save running time, we omit the activation function of the last layer
6, Code framework
Code framework:
Input image array, first layer weight, first layer offset, second layer weight, second layer offset
Output reasoning results
Implementation logic:
First layer neural network calculation
Layer 2 neural network calculation
Result output of query maximum probability
//Fully connected reasoning
//Incoming picture size img 2828
//The first layer has 78464 weight parameters, 64 offset parameters and 64 outputs
//Weight parameter of the second layer 64 * 10 offset parameter 10
//Find the maximum result and return 0-9
int my_predict(float *img, float *w1, float *b1, float *w2, float *b2) { //Layer 1 64 neurons x 28x28 w 784*64 b 64 outputs 64 outputs //w1x1+w2x2 ... wnxn+b //Layer 2 10 features 0-9 10 neuron inputs connect layer 1 outputs 64 w 64x10 b 10 outputs 10 //w1x1+w2x2 ... wnxn+b //The probability of querying that eigenvalue is the highest, and the }
7, Complete code implementation:
#include <stdio.h> #include <time.h> //import picture #include "input_0.h" #include "input_1.h" #include "input_2.h" #include "input_3.h" #include "input_4.h" #include "input_5.h" #include "input_6.h" #include "input_7.h" #include "input_8.h" #include "input_9.h" #include <windows.h> //Import weights w and offsets b #include "layer1_weight.h" #include "layer1_bais.h" #include "layer2_weight.h" #include "layer2_bais.h" //Input picture 28 * 28 //64 neurons in layer1 W: 784 * 64 B: 64 //layer2 neuron 10 W: 64 * 10 B: 10 //Calculate the maximum probability value and return int predict(float *img,float *w1,float *b1,float *w2,float *b2) { int i,j; float y; float a1[64],a2[10]; int ret; //First layer calculation //Multiple neuron computing for(i=0;i<64;i++) { //Single neuron calculation //y=w0*x0+w1*x1+w2*x2+...+w783*x783 + b y = 0.0; for(j=0;j<784;j++) { y = y + w1[j*64+i]*img[j]; } //Add bias y = y + b1[i]; //Adding active relu to convert linearity into nonlinearity y = y > 0?y:0; a1[i] = y; //Save the values of individual neurons // y = 0.0; } //y = 0.0; //Second layer calculation //Multiple neuron computing for(i=0;i<10;i++) { y = 0.0; for(j=0;j<64;j++) { //Single point calculation //y=w0*x0+w1*x1+w2*x2+...+w63*x63 + b y = y + w2[i+10*j]*a1[j]; } //Add bias y = y + b2[i]; a2[i] = y; } y = 0.0; //Calculate the maximum probability for(i=0;i<10;i++) { if(a2[i] > y) { y = a2[i]; ret = i; } } return ret; } void full_connect_test() { int ret; float *imgx[10]={ input_0, input_1, input_2, input_3, input_4, input_5, input_6, input_7, input_8, input_9 }; double run_time; LARGE_INTEGER time_start; //start time LARGE_INTEGER time_over; //End time double dqFreq; //Timer frequency LARGE_INTEGER f; //Timer frequency QueryPerformanceFrequency(&f); dqFreq=(double)f.QuadPart; for(int i=0;i<10;i++) { QueryPerformanceCounter(&time_start); //Timing start ret = predict(imgx[i],layer1_weight,layer1_bais,layer2_weight,layer2_bais); QueryPerformanceCounter(&time_over); //Timing end run_time=1000000*(time_over.QuadPart-time_start.QuadPart)/dqFreq; //Multiply by 1000000 to change the unit from seconds to microseconds, with an accuracy of 1000 000 / (cpu dominant frequency) microseconds printf("\nrun_time: %fus\n",run_time); //clock_t start = clock(); //ret = predict(imgx[i],layer1_weight,layer1_bais,layer2_weight,layer2_bais); //clock_t end = clock(); //double runtime = (double)(end - start) / CLOCKS_PER_SEC; //printf("runtime:%f s ",runtime); printf("input is %d ,predict:%d\n",i,ret); } } int main() { full_connect_test(); return 0; }
The operation results are as follows: