Caffe_stu01_ Theoretical preparation

About Caffe

At the beginning of Caffe design, the target is only for image, without considering the data of text, speech or time series. Therefore, Caffe supports convolutional neural network very well, but it does not support time series RNN, LSTM, etc. There are many common network models in the models folder of Caffe project, such as Lenet, AlexNet, ZFNet, VGGNet, Google net, ResNet, etc.

module structure

Caffe turns it from low to high

Data in the network is abstracted into Blob

Blob four-dimensional continuous array, usually expressed as (n, k, w, h) is the basic data structure, which can represent input and output data or parameter data

Blob represents data in the network, including training data, parameters of each layer of the network, and data transferred between networks are realized through blob. Meanwhile, BLOB data also supports storage on CPU and GPU, and can be synchronized between them.
Network of each Layer is abstracted into Layer
Layer is the abstraction of all kinds of layers in neural network, including convolution layer, subsampling layer, full connection layer and activation function layer. meanwhile
- Each Layer implements forward propagation and back propagation, and transmits data through Blob.
- Layer network basic unit, each layer type defines three kinds of calculation: 1. Initialize network parameters. 2. The realization of forward propagation. 3. Backward propagation.
The whole network is abstracted into Net
Net is the representation of the whole network, which is composed of the front and back connections of various layers. It is also the network model constructed.
- There is an initialization function for a loop free digraph, which has two main functions:
  1. Create blobs and layers.
  2. Call the setup function of layers to initialize layers.
  There are also two functions forward and backward, which call forward and backward of layers respectively.
The solution of network model is abstracted as Solver.

Solver defines the solution method for Net network model, records the network training process, saves the network model parameters, interrupts and recovers the network training process. Custom solver can realize different network solution methods.
The functions are as follows:
- Create training networks for learning and test networks for evaluation;
- Periodic evaluation test network;
- Iterative optimization and parameter updating are carried out by calling feedforward and feedback functions.
  In every iteration of solver, the output and loss are calculated by feedforward function, and the gradient is calculated by back feed propagation.
  Update the solver by updating the learning rate and other methods.
The trained caffe model is used to save and restore network parameters, with the suffix of. caffe model;

solver saves and restores the running state with the suffix of. solverstate

Set up neural network using process

1. data format processing

Process the data into Caffe supporting format, including level dB, memory data, hdfs data, image data, windows, dummy, etc

2. Prepare network structure documents

Define the network structure, for example, which layers are included in the current network, what is the function of each layer, and the most troublesome operation step in the process of using Caffe. For the specific writing format, please refer to Caffe's own handwriting recognition example: caffe/example/mnist/lenet_train_test.prototxt

3. Write network solution file

The parameters that need to be set in the process of network model training are defined, such as learning rate, weight attenuation coefficient, number of iterations, GPU or CPU, etc. the general naming method is xx_solver.prototxt , refer to: caffe/example/mnist/lenet_train_test.prototxt

4. Training

Command line based training, such as caffe train -solver examples/mnist/lenet_solver.prototxt

Training: solver.prototxt It is a network solution file, by which some network training parameters and network structure file paths are defined.

# Training examples (parameters: solving files)
caffe train -solver examples/mnist/lenet_solver.prototxt

# Recover training from half of the model snapshot (parameter: solve file snapshot)
caffe train -solver examples/mnist/lenet_solver.prototxt -snapshot examples/mnist/lenet_iter_5000.solversta

# From other trained model fine tune (parameter: solve file other trained model parameters) 
caffe train -solver examples/finetuning_on_flickr_style/solver.prototxt \
-weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel

5. Testing

caffe test -model examples/mnist/lenet_train_test.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel -gpu 0 -iterations 100

# score the learned LeNet model on the validation set as defined in the
# model architeture lenet_train_test.prototxt
# Test (parameters: solve the trained model parameters of the file)
caffe test -model examples/mnist/lenet_train_test.prototxt \ 
-weights examples/mnist/lenet_iter_10000.caffemodel -gpu 0 -iterations 100

Some important documents of Caffe

solver.prototxt
solver mainly stores some super parameters used in model training:
- net: = specify the structure file of the model to be trained, i.e. train_val.prototxt
- test_interval: = test interval, i.e. how many iterations to test
- test_initialization: = Specifies whether to conduct initial test, i.e. test when the model is not trained
- test_iteration: = Specifies the number of iterations to take when testing
- base_lr: = specified basic learning rate
- lr_policy: = learning rate change strategy, which is introduced here for reference
- gamma: = required parameters for learning rate change strategy
- power: = ditto
- stepsize: = change Step of learning rate change strategy Step (fixed Step)
- stepvalue: = change step of Multistep learning rate change strategy (variable step)
- max_iter: = maximum number of iterations for model training
- Momentum: = momentum, this is the optimization strategy (Adam, SGD,...) Parameters used
- Momentum 2: = parameters used by Adam in optimization strategy
- weight_decay: = weight decay rate
- clip_gradients: = fixed gradient range
- display: = show results every few iterations
- Snapshot: = snapshot, save model parameters every few times
- snapshot_prefix: = prefix to save the model file, which can be the path
- Type: = solver optimization strategy, namely SGD, Adam, AdaGRAD, RMSProp, NESTROVE, ADADELTA, etc
- solver_mode: = specify training mode, i.e. GPU/CPU
- debug_info: = Specifies whether to print debugging information. Here is an introduction to enable the output of this function
- device_id: = Specifies the device number (using GPU mode), default is 0
The user can set the corresponding parameters according to his own situation. The parameters in bold are required to be specified, and the other parameters are optional (select according to the situation)

train_val.prototxt

train_val Documents are for storageNetwork model structureThe structure of the model is mainly layer Build for units. Now let's LeNet As an example, the basic composition of the network layer is introduced:

name: "LeNet"
layer {
  name: "mnist"                                # Network layer name
  type: "Data"                                 # Network layer type, data layer
  top: "data"                                  # Input and data of this layer
  top: "label"                                 # Output of this layer, label
  include {    phase: TRAIN  }                 # TRAIN: = for training, TEST: = for testing
  transform_param {    scale: 0.00390625  }    # scale data
  data_param {                                 # Data layer configuration 
    source: "examples/mnist/mnist_train_lmdb"  # Data storage path
    batch_size: 64                             # Specify batch size
    backend: LMDB                              # Specify database format, LMDB/LevelDB
  }
}
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {    phase: TEST  }
  transform_param {    scale: 0.00390625  }
  data_param {
    source: "examples/mnist/mnist_test_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer{
    name:"conv1"       
    type:"Convolution" #Convolution layer
    bottom:"data"      #Output of the previous layer as input
    top:"conv1"        
    param{name:"conv1_w" lr_mult:1 decay_mult:1} #Name, learning rate and attenuation rate of convolution layer parameter w (relative to base_lr and weight_ Multiple of decay)
    param{name:"conv1_b" lr_mult:2 decay_mult:0} #Name, learning rate and decay rate of convolution layer parameter b
    convolution_param{
        num_output:20         #Number of feature map s output by convolution layer 
        kernel_size:5         #Size of convolution layer
        pad:0                 #Filling size of convolution layer
        stride:1              #Step size for convolution
        weight_filler{type:"xavier" }      #Initial call strategy of parameter w
        weight_filler{type:"constant" value:0.1}     #Initialization strategy of parameter b
    }
}
layer {　　　　　　　　＃BatchNorm Layers, right feature map Batch normalization
    name:"bn1"
    type:"BatchNorm"
    bottom:"conv1"
    top:"conv1"
    batch_norm_param{ use_global_stats:false} #false during training and true during testing
}
layer {           #Pool layer, i.e. lower sampling layer
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX   # Maximum pooling and AVE mean pooling
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {    lr_mult: 1  }
  param {    lr_mult: 2  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {      type: "xavier"    }
    bias_filler {      type: "constant"    }
  }
}
layer {
    name:"bn2"
    type:"BatchNorm"
    bottom:"conv2"
    top:"conv2"
    batch_norm_param{ use_global_stats:false}
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {           　　　           ＃Full connection layer
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {    lr_mult: 1  }  
  param {    lr_mult: 2  }
  inner_product_param {
    num_output: 500
    weight_filler {      type: "xavier"    }
    bias_filler {      type: "constant"    }
  }
}
layer {                             # Activate function layer to provide nonlinear capability
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {    lr_mult: 1  }
  param {    lr_mult: 2  }
  inner_product_param {
    num_output: 10
    weight_filler {      type: "xavier"    }
    bias_filler {      type: "constant"    }
  }
}
layer {                             # Loss function layer
  name: "prob"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "prob"
}

Keywords: network snapshot Google Windows

Added by andremta on Sun, 21 Jun 2020 08:31:52 +0300

Programming VIP

Caffe_stu01_ Theoretical preparation

About Caffe

module structure

Data in the network is abstracted into Blob

Network of each Layer is abstracted into Layer

The whole network is abstracted into Net

The solution of network model is abstracted as Solver.