About Caffe
At the beginning of Caffe design, the target is only for image, without considering the data of text, speech or time series. Therefore, Caffe supports convolutional neural network very well, but it does not support time series RNN, LSTM, etc. There are many common network models in the models folder of Caffe project, such as Lenet, AlexNet, ZFNet, VGGNet, Google net, ResNet, etc.
module structure
Caffe turns it from low to high
Blob four-dimensional continuous array, usually expressed as (n, k, w, h) is the basic data structure, which can represent input and output data or parameter data
-
Blob represents data in the network, including training data, parameters of each layer of the network, and data transferred between networks are realized through blob. Meanwhile, BLOB data also supports storage on CPU and GPU, and can be synchronized between them.
-
Network of each Layer is abstracted into Layer
-
Layer is the abstraction of all kinds of layers in neural network, including convolution layer, subsampling layer, full connection layer and activation function layer. meanwhile
-
Each Layer implements forward propagation and back propagation, and transmits data through Blob.
-
Layer network basic unit, each layer type defines three kinds of calculation: 1. Initialize network parameters. 2. The realization of forward propagation. 3. Backward propagation.
-
-
The whole network is abstracted into Net
-
Net is the representation of the whole network, which is composed of the front and back connections of various layers. It is also the network model constructed.
-
There is an initialization function for a loop free digraph, which has two main functions:
- Create blobs and layers.
- Call the setup function of layers to initialize layers.
There are also two functions forward and backward, which call forward and backward of layers respectively.
-
-
The solution of network model is abstracted as Solver.
Solver defines the solution method for Net network model, records the network training process, saves the network model parameters, interrupts and recovers the network training process. Custom solver can realize different network solution methods.
The functions are as follows:- Create training networks for learning and test networks for evaluation;
- Periodic evaluation test network;
- Iterative optimization and parameter updating are carried out by calling feedforward and feedback functions.
In every iteration of solver, the output and loss are calculated by feedforward function, and the gradient is calculated by back feed propagation.
Update the solver by updating the learning rate and other methods.
The trained caffe model is used to save and restore network parameters, with the suffix of. caffe model;
solver saves and restores the running state with the suffix of. solverstate
Set up neural network using process
1. data format processing
Process the data into Caffe supporting format, including level dB, memory data, hdfs data, image data, windows, dummy, etc
2. Prepare network structure documents
Define the network structure, for example, which layers are included in the current network, what is the function of each layer, and the most troublesome operation step in the process of using Caffe. For the specific writing format, please refer to Caffe's own handwriting recognition example: caffe/example/mnist/lenet_train_test.prototxt
3. Write network solution file
The parameters that need to be set in the process of network model training are defined, such as learning rate, weight attenuation coefficient, number of iterations, GPU or CPU, etc. the general naming method is xx_solver.prototxt , refer to: caffe/example/mnist/lenet_train_test.prototxt
4. Training
Command line based training, such as caffe train -solver examples/mnist/lenet_solver.prototxt
Training: solver.prototxt It is a network solution file, by which some network training parameters and network structure file paths are defined.
# Training examples (parameters: solving files) caffe train -solver examples/mnist/lenet_solver.prototxt # Recover training from half of the model snapshot (parameter: solve file snapshot) caffe train -solver examples/mnist/lenet_solver.prototxt -snapshot examples/mnist/lenet_iter_5000.solversta # From other trained model fine tune (parameter: solve file other trained model parameters) caffe train -solver examples/finetuning_on_flickr_style/solver.prototxt \ -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
5. Testing
caffe test -model examples/mnist/lenet_train_test.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel -gpu 0 -iterations 100
# score the learned LeNet model on the validation set as defined in the # model architeture lenet_train_test.prototxt # Test (parameters: solve the trained model parameters of the file) caffe test -model examples/mnist/lenet_train_test.prototxt \ -weights examples/mnist/lenet_iter_10000.caffemodel -gpu 0 -iterations 100
Some important documents of Caffe
-
solver.prototxt
solver mainly stores some super parameters used in model training:
- net: = specify the structure file of the model to be trained, i.e. train_val.prototxt
- test_interval: = test interval, i.e. how many iterations to test
- test_initialization: = Specifies whether to conduct initial test, i.e. test when the model is not trained
- test_iteration: = Specifies the number of iterations to take when testing
- base_lr: = specified basic learning rate
- lr_policy: = learning rate change strategy, which is introduced here for reference
- gamma: = required parameters for learning rate change strategy
- power: = ditto
- stepsize: = change Step of learning rate change strategy Step (fixed Step)
- stepvalue: = change step of Multistep learning rate change strategy (variable step)
- max_iter: = maximum number of iterations for model training
- Momentum: = momentum, this is the optimization strategy (Adam, SGD,...) Parameters used
- Momentum 2: = parameters used by Adam in optimization strategy
- weight_decay: = weight decay rate
- clip_gradients: = fixed gradient range
- display: = show results every few iterations
- Snapshot: = snapshot, save model parameters every few times
- snapshot_prefix: = prefix to save the model file, which can be the path
- Type: = solver optimization strategy, namely SGD, Adam, AdaGRAD, RMSProp, NESTROVE, ADADELTA, etc
- solver_mode: = specify training mode, i.e. GPU/CPU
- debug_info: = Specifies whether to print debugging information. Here is an introduction to enable the output of this function
- device_id: = Specifies the device number (using GPU mode), default is 0
The user can set the corresponding parameters according to his own situation. The parameters in bold are required to be specified, and the other parameters are optional (select according to the situation)
-
train_val.prototxt
train_val Documents are for storageNetwork model structureThe structure of the model is mainly layer Build for units. Now let's LeNet As an example, the basic composition of the network layer is introduced:
name: "LeNet" layer { name: "mnist" # Network layer name type: "Data" # Network layer type, data layer top: "data" # Input and data of this layer top: "label" # Output of this layer, label include { phase: TRAIN } # TRAIN: = for training, TEST: = for testing transform_param { scale: 0.00390625 } # scale data data_param { # Data layer configuration source: "examples/mnist/mnist_train_lmdb" # Data storage path batch_size: 64 # Specify batch size backend: LMDB # Specify database format, LMDB/LevelDB } } layer { name: "mnist" type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { scale: 0.00390625 } data_param { source: "examples/mnist/mnist_test_lmdb" batch_size: 100 backend: LMDB } } layer{ name:"conv1" type:"Convolution" #Convolution layer bottom:"data" #Output of the previous layer as input top:"conv1" param{name:"conv1_w" lr_mult:1 decay_mult:1} #Name, learning rate and attenuation rate of convolution layer parameter w (relative to base_lr and weight_ Multiple of decay) param{name:"conv1_b" lr_mult:2 decay_mult:0} #Name, learning rate and decay rate of convolution layer parameter b convolution_param{ num_output:20 #Number of feature map s output by convolution layer kernel_size:5 #Size of convolution layer pad:0 #Filling size of convolution layer stride:1 #Step size for convolution weight_filler{type:"xavier" } #Initial call strategy of parameter w weight_filler{type:"constant" value:0.1} #Initialization strategy of parameter b } } layer { #BatchNorm Layers, right feature map Batch normalization name:"bn1" type:"BatchNorm" bottom:"conv1" top:"conv1" batch_norm_param{ use_global_stats:false} #false during training and true during testing } layer { #Pool layer, i.e. lower sampling layer name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX # Maximum pooling and AVE mean pooling kernel_size: 2 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 50 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name:"bn2" type:"BatchNorm" bottom:"conv2" top:"conv2" batch_norm_param{ use_global_stats:false} } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { #Full connection layer name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { # Activate function layer to provide nonlinear capability name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { # Loss function layer name: "prob" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "prob" }