Deep learning VGG16 network based on tensorflow 2.0

The network depth of VGG series is improved compared with its previous networks. VGG16 and VGG19 are the representatives of VGG series. This time, VGG16 network is realized based on tensorflow 2.0.

1. Introduction to vgg16 network

VGG16 network model stood out in the 2014 ImageNet competition, ranking second in classification tasks and first in positioning tasks. Compared with the previous LexNet and LeNet networks, VGG16 network reached an unprecedented level in the number of network layers at that time.

2. Network structure

3. Innovation

① Use 3x3 convolution kernel instead of 7x7 convolution kernel.

3x3 convolution kernel is the smallest receptive field size that can feel the focus of up, down, left and right. Moreover, when two 3x3 convolution kernels are superimposed, their receptive fields are equivalent to one 5x5 convolution kernel. After three convolution kernels are superimposed, their receptive fields are equivalent to one 7x7 effect.
Because the receptive fields are the same, three 3x3 convolutions use three nonlinear activation functions, which increases the nonlinear expression ability and makes the segmentation plane more separable. At the same time, small convolution kernel is used to greatly reduce the amount of parameters.
The use of 3x3 convolution kernel stack not only increases the number of network layers, but also reduces the amount of parameters.

② By increasing the number of channels to reach a deeper network, 2x2 pooled core and Max pooling method are used.

Using 2x2 pooled cores, small pooled cores can bring more detailed information capture. At that time, there was also average pooling, but the effect of max pooling was better in the image task. max was easier to capture the changes in the image, resulting in greater local information difference and better description of edge texture.

4. Network implementation

def VGG16(nb_class,input_shape):
    input_ten = Input(shape=input_shape)
    x = tf.keras.layers.Conv2D(filters=64,kernel_size=(3,3),activation='relu',padding='same')(input_ten)
    x = tf.keras.layers.Conv2D(filters=64,kernel_size=(3,3),activation='relu',padding='same')(x)
    x = tf.keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2))(x)
    x = tf.keras.layers.Conv2D(filters=128,kernel_size=(3,3),activation='relu',padding='same')(x)
    x = tf.keras.layers.Conv2D(filters=128,kernel_size=(3,3),activation='relu',padding='same')(x)
    x = tf.keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2))(x)
    x = tf.keras.layers.Conv2D(filters=256,kernel_size=(3,3),activation='relu',padding='same')(x)
    x = tf.keras.layers.Conv2D(filters=256,kernel_size=(3,3),activation='relu',padding='same')(x)
    x = tf.keras.layers.Conv2D(filters=256,kernel_size=(3,3),activation='relu',padding='same')(x)
    x = tf.keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2))(x)
    x = tf.keras.layers.Conv2D(filters=512,kernel_size=(3,3),activation='relu',padding='same')(x)
    x = tf.keras.layers.Conv2D(filters=512,kernel_size=(3,3),activation='relu',padding='same')(x)
    x = tf.keras.layers.Conv2D(filters=512,kernel_size=(3,3),activation='relu',padding='same')(x)
    x = tf.keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2))(x)
    x = tf.keras.layers.Conv2D(filters=512,kernel_size=(3,3),activation='relu',padding='same')(x)
    x = tf.keras.layers.Conv2D(filters=512,kernel_size=(3,3),activation='relu',padding='same')(x)
    x = tf.keras.layers.Conv2D(filters=512,kernel_size=(3,3),activation='relu',padding='same')(x)
    x = tf.keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2))(x)
    x = tf.keras.layers.Flatten()(x)
    x = Dense(4096,activation='relu')(x)
    x = Dense(4096,activation='relu')(x)
    output_ten = Dense(nb_class,activation='softmax')(x)
    model = Model(input_ten,output_ten)
    return model
model_VGG16 = VGG16(24,(img_height,img_width,3))
Model: "model"
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 224, 224, 3)]     0         
conv2d (Conv2D)              (None, 224, 224, 64)      1792      
conv2d_1 (Conv2D)            (None, 224, 224, 64)      36928     
max_pooling2d (MaxPooling2D) (None, 112, 112, 64)      0         
conv2d_2 (Conv2D)            (None, 112, 112, 128)     73856     
conv2d_3 (Conv2D)            (None, 112, 112, 128)     147584    
max_pooling2d_1 (MaxPooling2 (None, 56, 56, 128)       0         
conv2d_4 (Conv2D)            (None, 56, 56, 256)       295168    
conv2d_5 (Conv2D)            (None, 56, 56, 256)       590080    
conv2d_6 (Conv2D)            (None, 56, 56, 256)       590080    
max_pooling2d_2 (MaxPooling2 (None, 28, 28, 256)       0         
conv2d_7 (Conv2D)            (None, 28, 28, 512)       1180160   
conv2d_8 (Conv2D)            (None, 28, 28, 512)       2359808   
conv2d_9 (Conv2D)            (None, 28, 28, 512)       2359808   
max_pooling2d_3 (MaxPooling2 (None, 14, 14, 512)       0         
conv2d_10 (Conv2D)           (None, 14, 14, 512)       2359808   
conv2d_11 (Conv2D)           (None, 14, 14, 512)       2359808   
conv2d_12 (Conv2D)           (None, 14, 14, 512)       2359808   
max_pooling2d_4 (MaxPooling2 (None, 7, 7, 512)         0         
flatten (Flatten)            (None, 25088)             0         
dense (Dense)                (None, 4096)              102764544 
dense_1 (Dense)              (None, 4096)              16781312  
dense_2 (Dense)              (None, 24)                98328     
Total params: 134,358,872
Trainable params: 134,358,872
Non-trainable params: 0

It can be found that the training parameters of VGG16 have reached 134358872, many times more than that of AlexNet, but the network performance of VGG series is still relatively good.

