Classic network architecture of deep learning: AlexNet

1, Introduction

Alex net was designed by Hinton, the winner of the 2012 ImageNet competition, and his student Alex Krizhevsky. Also after that year, more and deeper neural networks were proposed, such as the excellent VGg and Google lenet. The accuracy of the official data model is 57.1% and the top 1-5 is 80.2%. This is quite excellent for the traditional machine learning classification algorithm.

2, Network structure

The figure above shows the network structure of alexnet in caffe. The figure above shows two GPU servers. You can see two flow charts. The network structure of alexnet is shown below:

The simplified structure is:

Why did AlexNet achieve better results?

1. The Relu activation function is used.

Relu function: f(x)=max(0,x)

The deep convolution network based on ReLU is several times faster than the network based on tanh and sigmoid. The following figure shows the number of iterations of a four-layer convolution network based on CIFAR-10 with 25% training error in tanh and ReLU;

2. Local Response Normalization

After using ReLU f(x)=max(0,x), you will find that the value after the activation function does not have a value range like tanh and sigmoid functions. Therefore, generally, a normalization will be done after ReLU. LRU is a method that is steadily proposed (uncertain here, it should be proposed?). In neuroscience, there is a concept called "late inhibition", It's about the effect of active neurons on their peripheral neurons.

3. Dropout

Dropout is also a frequently mentioned concept, which can effectively prevent over fitting of neural network. Compared with the general linear model, the regular method is used to prevent the model from over fitting, while in the neural network, dropout is realized by modifying the structure of the neural network itself. For a certain layer of neurons, some neurons are deleted randomly through the defined probability, while keeping the individuals of neurons in the input layer and output layer unchanged, and then the parameters are updated according to the learning method of neural network. In the next iteration, some neurons are deleted randomly again until the end of training.

4. data augmentation

In deep learning, when the amount of data is not large enough, there are generally four solutions:

  • data augmentation - manually increase the size of the training set - create a batch of "new" data from the existing data by translating, flipping, adding noise and other methods
  • Regularization - a small amount of data will lead to over fitting of the model, making the training error very small and the test error particularly large. The generation of over fitting can be suppressed by adding a regular term after the Loss Function. The disadvantage is the introduction of a hyper parameter that needs to be adjusted manually.
  • Dropout is also a regularization method, but different from the above, it is realized by randomly setting the output of some neurons to zero
  • Unsupervised pre training -- unsupervised pre training is performed layer by layer in the convolution form of auto encoder or RBM, and finally supervised fine tuning is performed with the classification layer

3, tensorflow code implementation

# -*- coding=UTF-8 -*-
import sys
import os
import random
import cv2
import math
import time
import numpy as np
import tensorflow as tf
import linecache
import string
import skimage
import imageio
# input data
import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
# Define network superparameters
learning_rate = 0.001
training_iters = 200000
batch_size = 64
display_step = 20
# Define network parameters
n_input = 784  # Dimension entered
n_classes = 10 # Dimension of label
dropout = 0.8  # Dropout probability
# Placeholder input
x = tf.placeholder(tf.types.float32, [None, n_input])
y = tf.placeholder(tf.types.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.types.float32)
# Convolution operation
def conv2d(name, l_input, w, b):
    return tf.nn.relu(tf.nn.bias_add( \
    tf.nn.conv2d(l_input, w, strides=[1, 1, 1, 1], padding='SAME'),b) \
    , name=name)
# Maximum downsampling operation
def max_pool(name, l_input, k):
    return tf.nn.max_pool(l_input, ksize=[1, k, k, 1], \
    strides=[1, k, k, 1], padding='SAME', name=name)
# Normalization operation
def norm(name, l_input, lsize=4):
    return tf.nn.lrn(l_input, lsize, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name=name)
# Define the entire network 
def alex_net(_X, _weights, _biases, _dropout):
    _X = tf.reshape(_X, shape=[-1, 28, 28, 1]) # Vector to matrix
    # Convolution layer
    conv1 = conv2d('conv1', _X, _weights['wc1'], _biases['bc1'])
    # Lower sampling layer
    pool1 = max_pool('pool1', conv1, k=2)
    # Normalization layer
    norm1 = norm('norm1', pool1, lsize=4)
    # Dropout
    norm1 = tf.nn.dropout(norm1, _dropout)
    # convolution
    conv2 = conv2d('conv2', norm1, _weights['wc2'], _biases['bc2'])
    # Down sampling
    pool2 = max_pool('pool2', conv2, k=2)
    # normalization
    norm2 = norm('norm2', pool2, lsize=4)
    # Dropout
    norm2 = tf.nn.dropout(norm2, _dropout)
    # convolution
    conv3 = conv2d('conv3', norm2, _weights['wc3'], _biases['bc3'])
    # Down sampling
    pool3 = max_pool('pool3', conv3, k=2)
    # normalization
    norm3 = norm('norm3', pool3, lsize=4)
    # Dropout
    norm3 = tf.nn.dropout(norm3, _dropout)
    # In the full connection layer, first turn the characteristic graph into a vector
    dense1 = tf.reshape(norm3, [-1, _weights['wd1'].get_shape().as_list()[0]]) 
    dense1 = tf.nn.relu(tf.matmul(dense1, _weights['wd1']) + _biases['bd1'], name='fc1') 
    # Full connection layer
    dense2 = tf.nn.relu(tf.matmul(dense1, _weights['wd2']) + _biases['bd2'], name='fc2') # Relu activation
    # Network output layer
    out = tf.matmul(dense2, _weights['out']) + _biases['out']
    return out
# Store all network parameters
weights = {
    'wc1': tf.Variable(tf.random_normal([3, 3, 1, 64])),
    'wc2': tf.Variable(tf.random_normal([3, 3, 64, 128])),
    'wc3': tf.Variable(tf.random_normal([3, 3, 128, 256])),
    'wd1': tf.Variable(tf.random_normal([4*4*256, 1024])),
    'wd2': tf.Variable(tf.random_normal([1024, 1024])),
    'out': tf.Variable(tf.random_normal([1024, 10]))
biases = {
    'bc1': tf.Variable(tf.random_normal([64])),
    'bc2': tf.Variable(tf.random_normal([128])),
    'bc3': tf.Variable(tf.random_normal([256])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'bd2': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([n_classes]))
# Build model
pred = alex_net(x, weights, biases, keep_prob)
# Define loss function and learning steps
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Test network
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initialize all shared variables
init = tf.initialize_all_variables()
# Start a workout
with tf.Session() as sess:
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        # Get batch data, feed_dict={x: batch_xs, y: batch_ys, keep_prob: dropout})
        if step % display_step == 0:
            # Calculation accuracy
            acc =, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 1.})
            # Calculate loss value
            loss =, feed_dict={x: batch_xs, y: batch_ys, keep_prob: 1.})
            print "Iter " + str(step*batch_size) + ", Minibatch Loss= " + "{:.6f}".format(loss) + ", Training Accuracy= " + "{:.5f}".format(acc)
        step += 1
    print "Optimization Finished!"
    # Calculate test accuracy
    print "Testing Accuracy:",, feed_dict={x: mnist.test.images[:256], y: m

Keywords: neural networks Deep Learning caffe

Added by intermediate on Thu, 30 Sep 2021 01:30:52 +0300