About Caffe
At the beginning of Caffe design, the target is only for image, without considering the data of text, speech or time series. Therefore, Caffe supports convolutional neural network very well, but it does not support time series RNN, LSTM, etc. There are many common network models in the models folder of Caffe project, such as Lenet, AlexNet, ZFNet, VGGNet, Google net, ResNet, etc.
module structure
Caffe turns it from low to high
Blob fourdimensional continuous array, usually expressed as (n, k, w, h) is the basic data structure, which can represent input and output data or parameter data

Blob represents data in the network, including training data, parameters of each layer of the network, and data transferred between networks are realized through blob. Meanwhile, BLOB data also supports storage on CPU and GPU, and can be synchronized between them.

Network of each Layer is abstracted into Layer

Layer is the abstraction of all kinds of layers in neural network, including convolution layer, subsampling layer, full connection layer and activation function layer. meanwhile

Each Layer implements forward propagation and back propagation, and transmits data through Blob.

Layer network basic unit, each layer type defines three kinds of calculation: 1. Initialize network parameters. 2. The realization of forward propagation. 3. Backward propagation.


The whole network is abstracted into Net

Net is the representation of the whole network, which is composed of the front and back connections of various layers. It is also the network model constructed.

There is an initialization function for a loop free digraph, which has two main functions:
 Create blobs and layers.
 Call the setup function of layers to initialize layers.
There are also two functions forward and backward, which call forward and backward of layers respectively.


The solution of network model is abstracted as Solver.
Solver defines the solution method for Net network model, records the network training process, saves the network model parameters, interrupts and recovers the network training process. Custom solver can realize different network solution methods.
The functions are as follows: Create training networks for learning and test networks for evaluation;
 Periodic evaluation test network;
 Iterative optimization and parameter updating are carried out by calling feedforward and feedback functions.
In every iteration of solver, the output and loss are calculated by feedforward function, and the gradient is calculated by back feed propagation.
Update the solver by updating the learning rate and other methods.
The trained caffe model is used to save and restore network parameters, with the suffix of. caffe model;
solver saves and restores the running state with the suffix of. solverstate
Set up neural network using process
1. data format processing
Process the data into Caffe supporting format, including level dB, memory data, hdfs data, image data, windows, dummy, etc
2. Prepare network structure documents
Define the network structure, for example, which layers are included in the current network, what is the function of each layer, and the most troublesome operation step in the process of using Caffe. For the specific writing format, please refer to Caffe's own handwriting recognition example: caffe/example/mnist/lenet_train_test.prototxt
3. Write network solution file
The parameters that need to be set in the process of network model training are defined, such as learning rate, weight attenuation coefficient, number of iterations, GPU or CPU, etc. the general naming method is xx_solver.prototxt , refer to: caffe/example/mnist/lenet_train_test.prototxt
4. Training
Command line based training, such as caffe train solver examples/mnist/lenet_solver.prototxt
Training: solver.prototxt It is a network solution file, by which some network training parameters and network structure file paths are defined.
# Training examples (parameters: solving files) caffe train solver examples/mnist/lenet_solver.prototxt # Recover training from half of the model snapshot (parameter: solve file snapshot) caffe train solver examples/mnist/lenet_solver.prototxt snapshot examples/mnist/lenet_iter_5000.solversta # From other trained model fine tune (parameter: solve file other trained model parameters) caffe train solver examples/finetuning_on_flickr_style/solver.prototxt \ weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
5. Testing
caffe test model examples/mnist/lenet_train_test.prototxt weights examples/mnist/lenet_iter_10000.caffemodel gpu 0 iterations 100
# score the learned LeNet model on the validation set as defined in the # model architeture lenet_train_test.prototxt # Test (parameters: solve the trained model parameters of the file) caffe test model examples/mnist/lenet_train_test.prototxt \ weights examples/mnist/lenet_iter_10000.caffemodel gpu 0 iterations 100
Some important documents of Caffe

solver.prototxt
solver mainly stores some super parameters used in model training:
 net: = specify the structure file of the model to be trained, i.e. train_val.prototxt
 test_interval: = test interval, i.e. how many iterations to test
 test_initialization: = Specifies whether to conduct initial test, i.e. test when the model is not trained
 test_iteration: = Specifies the number of iterations to take when testing
 base_lr: = specified basic learning rate
 lr_policy: = learning rate change strategy, which is introduced here for reference
 gamma: = required parameters for learning rate change strategy
 power: = ditto
 stepsize: = change Step of learning rate change strategy Step (fixed Step)
 stepvalue: = change step of Multistep learning rate change strategy (variable step)
 max_iter: = maximum number of iterations for model training
 Momentum: = momentum, this is the optimization strategy (Adam, SGD,...) Parameters used
 Momentum 2: = parameters used by Adam in optimization strategy
 weight_decay: = weight decay rate
 clip_gradients: = fixed gradient range
 display: = show results every few iterations
 Snapshot: = snapshot, save model parameters every few times
 snapshot_prefix: = prefix to save the model file, which can be the path
 Type: = solver optimization strategy, namely SGD, Adam, AdaGRAD, RMSProp, NESTROVE, ADADELTA, etc
 solver_mode: = specify training mode, i.e. GPU/CPU
 debug_info: = Specifies whether to print debugging information. Here is an introduction to enable the output of this function
 device_id: = Specifies the device number (using GPU mode), default is 0
The user can set the corresponding parameters according to his own situation. The parameters in bold are required to be specified, and the other parameters are optional (select according to the situation)

train_val.prototxt
train_val Documents are for storageNetwork model structureThe structure of the model is mainly layer Build for units. Now let's LeNet As an example, the basic composition of the network layer is introduced:
name: "LeNet" layer { name: "mnist" # Network layer name type: "Data" # Network layer type, data layer top: "data" # Input and data of this layer top: "label" # Output of this layer, label include { phase: TRAIN } # TRAIN: = for training, TEST: = for testing transform_param { scale: 0.00390625 } # scale data data_param { # Data layer configuration source: "examples/mnist/mnist_train_lmdb" # Data storage path batch_size: 64 # Specify batch size backend: LMDB # Specify database format, LMDB/LevelDB } } layer { name: "mnist" type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { scale: 0.00390625 } data_param { source: "examples/mnist/mnist_test_lmdb" batch_size: 100 backend: LMDB } } layer{ name:"conv1" type:"Convolution" #Convolution layer bottom:"data" #Output of the previous layer as input top:"conv1" param{name:"conv1_w" lr_mult:1 decay_mult:1} #Name, learning rate and attenuation rate of convolution layer parameter w (relative to base_lr and weight_ Multiple of decay) param{name:"conv1_b" lr_mult:2 decay_mult:0} #Name, learning rate and decay rate of convolution layer parameter b convolution_param{ num_output:20 #Number of feature map s output by convolution layer kernel_size:5 #Size of convolution layer pad:0 #Filling size of convolution layer stride:1 #Step size for convolution weight_filler{type:"xavier" } #Initial call strategy of parameter w weight_filler{type:"constant" value:0.1} #Initialization strategy of parameter b } } layer { ＃BatchNorm Layers, right feature map Batch normalization name:"bn1" type:"BatchNorm" bottom:"conv1" top:"conv1" batch_norm_param{ use_global_stats:false} #false during training and true during testing } layer { #Pool layer, i.e. lower sampling layer name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX # Maximum pooling and AVE mean pooling kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 50 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name:"bn2" type:"BatchNorm" bottom:"conv2" top:"conv2" batch_norm_param{ use_global_stats:false} } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { ＃Full connection layer name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { # Activate function layer to provide nonlinear capability name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { # Loss function layer name: "prob" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "prob" }