Back to CNN convolutional neural network directory
The last chapter: Depth part -- CNN convolutional neural network (3) On ROI pooling and ROI Align and interpolation
In this section, I will elaborate on the demonstration project of mnist handwritten digit code using tf cnn
github code of this project: https://github.com/wandaoyi/tf_cnn_mnist_pro
5, TF CNN MNIST handwritten digit code demonstration
(1) preface.
Prior to In depth - neural network We have learned ANN and DNN. Now, we have learned CNN. For learning to apply, we will use CNN to build convolutional neural network.
(2) . define requirements
The requirement of the project is to recognize the handwritten Arabic numeral pictures from 0 to 9. For example, the number on the invoice (the first one for handwritten number recognition was written by a bank in the United States in 1989, hired by a tycoon, and was written by LeNet-5, a convolutional neural network technology at that time). Front In depth part -- neural network (7) detailed description of DNN Neural Network handwritten digit code demonstration We have used DNN case study; now, let's use LeNet-5 for a case study. At that time, the project was used to identify the signed numbers on the check. Training network, of course, is inseparable from data, so we first download the data, which has been uploaded to Baidu cloud disk for you: link: https://pan.baidu.com/s/13OokGc0h3F5rGrxuSLYj9Q extraction code: qfj6.
(3) . build project
The project structure is as follows:
The above model is the accuracy of 10 epochs I randomly trained: 0.984000. Before we used DNN, the accuracy of 10 epochs was only 0.96 +. In this way, we can see that the accuracy has increased by 2 percentage points. In this way, some people may not be satisfied with it and think it is not good. However, if we look at it in reverse, we can see that the error rate has been reduced by half, so the effect will be very considerable. It's important to be able to do things when working on projects in the company, but it's also important to be able to express yourself.
(4) . environmental dependence
Environmental dependence:
pip install numpy==1.16 pip install easydict conda install tensorflow-gpu==1.13.1 # It is not recommended to use tf version 2.0, which has many pits
The installation of tensorflow is explained in detail in my previous blog: Fragmented part -- Installation of tensorflow gpu version If not, you can see how to install.
README.md file:
# tf_cnn_mnist_pro tf_cnn Handwritten number forecast 2020-02-09 - Project download address: https://github.com/wandaoyi/tf_cnn_mnist_pro - Please go to Baidu cloud disk to download the training data required for the project: - Links: https://pan.baidu.com/s/13OokGc0h3F5rGrxuSLYj9Q extraction code: qfj6 ## Parameter setting - Before training or forecasting, we need to set parameters - open config.py File, where parameters or paths are set. ## Model - Model code model_net.py - Here, we use lenet-5 Network model to extract features ## Training model - Function cnn_mnist_train.py ,Simple operation, right click directly run - The training effect is as follows: - acc_train: 1.0 - epoch: 10, acc_test: 0.984000 - Here is the effect of random training. If you want to get good results, you can train more epoch - You can also add it yourself early-stopping Go in it's not a problem ## Forecast - Function cnn_mnist_test.py ,Simple operation, right click directly run - After running, some forecast results will be printed on the console - The prediction effect is as follows: - predicted value: [7 2 1 0 4] - True value: [7 2 1 0 4] ## Tensorbboard log - Use tensorboard The advantage of this log is that it is real-time, and you can watch the renderings while training. - stay cmd Command window, enter the following command: - tensorboard --logdir=G:\work_space\python_space\pro2018_space\wandao\mnist_pro\logs\mnist_log_train --host=localhost - stay --logdir= Followed by the folder path of the log, - stay --host= Is used to specify ip If you don't write it, you can only use the address of the computer instead of using it localhost - Open on Google browser tensorboard Journal: http://localhost:6006/ - Model acc ![image](./docs/images/acc.png) - model structure ![image](./docs/images/graphs.png)
The following file or code, in which there are comments
(5) . config.py
#!/usr/bin/env python # _*_ coding:utf-8 _*_ # ============================================ # @Time : 2020/02/08 19:23 # @Author : WanDaoYi # @FileName : config.py # ============================================ from easydict import EasyDict as edict import os __C = edict() cfg = __C # common options public profile __C.COMMON = edict() # Windows gets the absolute path of files, which is convenient for windows to run projects in black windows __C.COMMON.BASE_PATH = os.path.abspath(os.path.dirname(__file__)) # # Get the path of the current window. When using Linux, switch to this, or an error will be reported. (windows can also use this) # __C.COMMON.BASE_PATH = os.getcwd() __C.COMMON.DATA_PATH = os.path.join(__C.COMMON.BASE_PATH, "dataset") # Shape of image __C.COMMON.DATA_RESHAPE = [-1, 28, 28, 1] # Shape of image rezie __C.COMMON.DATA_RESIZE = (32, 32) # Training configuration __C.TRAIN = edict() # Learning rate __C.TRAIN.LEARNING_RATE = 0.01 # batch_size __C.TRAIN.BATCH_SIZE = 32 # Iteration times __C.TRAIN.N_EPOCH = 10 # Model save path, use relative path, easy to transplant __C.TRAIN.MODEL_SAVE_PATH = "./checkpoint/model_" # dropout's holdings, 0.7 represents 70% of the nodes. __C.TRAIN.KEEP_PROB_DROPOUT = 0.7 # Test configuration __C.TEST = edict() # Test model save path __C.TEST.CKPT_MODEL_SAVE_PATH = "./checkpoint/model_acc=0.984000.ckpt-10" # Log configuration __C.LOG = edict() # Log saving path, followed by a trail or test: for example, MNIST log trail __C.LOG.LOG_SAVE_PATH = "./logs/mnist_log_"
(6) . common.py
#!/usr/bin/env python # _*_ coding:utf-8 _*_ # ============================================ # @Time : 2020/02/08 19:26 # @Author : WanDaoYi # @FileName : common.py # ============================================ import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data from config import cfg import numpy as np class Common(object): def __init__(self): # Data path self.data_file_path = cfg.COMMON.DATA_PATH pass # Read data def read_data(self): # Data download address: http://yann.lecun.com/exdb/mnist/ mnist_data = input_data.read_data_sets(self.data_file_path, one_hot=True) train_image = mnist_data.train.images train_label = mnist_data.train.labels _, n_feature = train_image.shape _, n_label = train_label.shape return mnist_data, n_feature, n_label # bn operation def deal_bn(self, input_data, train_flag=True): bn_info = tf.layers.batch_normalization(input_data, beta_initializer=tf.zeros_initializer(), gamma_initializer=tf.ones_initializer(), moving_mean_initializer=tf.zeros_initializer(), moving_variance_initializer=tf.ones_initializer(), training=train_flag) return bn_info pass # Pooling treatment def deal_pool(self, input_data, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1), padding="VALID", name="avg_pool"): pool_info = tf.nn.avg_pool(value=input_data, ksize=ksize, strides=strides, padding=padding, name=name) tf.summary.histogram('pooling', pool_info) return pool_info pass # dropout processing def deal_dropout(self, hidden_layer, keep_prob): with tf.name_scope("dropout"): tf.summary.scalar('dropout_keep_probability', keep_prob) dropped = tf.nn.dropout(hidden_layer, keep_prob) tf.summary.histogram('dropped', dropped) return dropped pass # Parameter record def variable_summaries(self, param): with tf.name_scope('summaries'): mean = tf.reduce_mean(param) tf.summary.scalar('mean', mean) with tf.name_scope('stddev'): stddev = tf.sqrt(tf.reduce_mean(tf.square(param - mean))) tf.summary.scalar('stddev', stddev) tf.summary.scalar('max', tf.reduce_max(param)) tf.summary.scalar('min', tf.reduce_min(param)) tf.summary.histogram('histogram', param) # Full connection operation def neural_layer(self, x, n_neuron, name="fc"): # Include all computing nodes. For this layer, the name scope can be written or not with tf.name_scope(name=name): n_input = int(x.get_shape()[1]) stddev = 2 / np.sqrt(n_input) # The w in this layer can be regarded as a two-dimensional array. Each neuron has a set of w parameters # truncated normal distribution has a smaller value than regular normal distribution # There will be no big weight value to ensure a slow and steady training # Using this standard deviation will make convergence faster # The w parameter needs to be random, not 0, otherwise the output is 0, and the final adjustment is not significant. with tf.name_scope("weights"): init_w = tf.truncated_normal((n_input, n_neuron), stddev=stddev) w = tf.Variable(init_w, name="weight") self.variable_summaries(w) with tf.name_scope("biases"): b = tf.Variable(tf.zeros([n_neuron]), name="bias") self.variable_summaries(b) with tf.name_scope("wx_plus_b"): z = tf.matmul(x, w) + b tf.summary.histogram('pre_activations', z) return z # Convolution operation def conv2d(self, input_data, filter_shape, strides_shape=(1, 1, 1, 1), padding="VALID", train_flag=True, name="conv2d"): with tf.variable_scope(name): weight = tf.get_variable(name="weight", dtype=tf.float32, trainable=train_flag, shape=filter_shape, initializer=tf.random_normal_initializer(stddev=0.01)) conv = tf.nn.conv2d(input=input_data, filter=weight, strides=strides_shape, padding=padding) conv_2_bn = self.deal_bn(conv, train_flag=train_flag) return conv_2_bn pass pass
(7) . model code
#!/usr/bin/env python # _*_ coding:utf-8 _*_ # ============================================ # @Time : 2020/02/08 22:26 # @Author : WanDaoYi # @FileName : model_net.py # ============================================ import tensorflow as tf from core.common import Common class ModelNet(object): def __init__(self): self.common = Common() pass def lenet_5(self, input_data, n_label=10, keep_prob=1.0, train_flag=True): with tf.variable_scope("lenet-5"): conv_1 = self.common.conv2d(input_data, (5, 5, 1, 6), name="conv_1") tanh_1 = tf.nn.tanh(conv_1, name="tanh_1") avg_pool_1 = self.common.deal_pool(tanh_1, name="avg_pool_1") conv_2 = self.common.conv2d(avg_pool_1, (5, 5, 6, 16), name="conv_2") tanh_2 = tf.nn.tanh(conv_2, name="tanh_2") avg_pool_2 = self.common.deal_pool(tanh_2, name="avg_pool_2") conv_3 = self.common.conv2d(avg_pool_2, (5, 5, 16, 120), name="conv_3") tanh_3 = tf.nn.tanh(conv_3, name="tanh_3") reshape_data = tf.reshape(tanh_3, [-1, 120]) dropout_1 = self.common.deal_dropout(reshape_data, keep_prob) fc_1 = self.common.neural_layer(dropout_1, 84, name="fc_1") tanh_4 = tf.nn.tanh(fc_1, name="tanh_4") dropout_2 = self.common.deal_dropout(tanh_4, keep_prob) fc_2 = self.common.neural_layer(dropout_2, n_label, name="fc_2") scale_2 = self.common.deal_bn(fc_2, train_flag=train_flag) result_info = tf.nn.softmax(scale_2, name="result_info") return result_info pass
The model here, I used lenet-5, of course, I want to change other models later, it is also OK. In lenet-5, the input of the model is a 32 x 32 size image required by the shape. Otherwise, if the scale is not enough, the model will report an error. So, resize the image to 32 x 32.
(8) . training code
#!/usr/bin/env python # _*_ coding:utf-8 _*_ # ============================================ # @Time : 2020/02/08 19:24 # @Author : WanDaoYi # @FileName : cnn_mnist_train.py # ============================================ from datetime import datetime import tensorflow as tf from config import cfg from core.common import Common from core.model_net import ModelNet class CnnMnistTrain(object): def __init__(self): # Model save path self.model_save_path = cfg.TRAIN.MODEL_SAVE_PATH self.log_path = cfg.LOG.LOG_SAVE_PATH self.learning_rate = cfg.TRAIN.LEARNING_RATE self.batch_size = cfg.TRAIN.BATCH_SIZE self.n_epoch = cfg.TRAIN.N_EPOCH self.data_shape = cfg.COMMON.DATA_RESHAPE self.data_resize = cfg.COMMON.DATA_RESIZE self.common = Common() self.model_net = ModelNet() # Read data and dimensions self.mnist_data, self.n_feature, self.n_label = self.common.read_data() # Create a blueprint with tf.name_scope(name="input_data"): self.x = tf.placeholder(dtype=tf.float32, shape=(None, self.n_feature), name="input_data") self.y = tf.placeholder(dtype=tf.float32, shape=(None, self.n_label), name="input_labels") with tf.name_scope(name="input_shape"): # 784 dimensions are transformed into pictures and kept to nodes # -1 represents the number of incoming pictures, 28, 28 is the height and width of the picture, 1 is the color channel of the picture image_shaped_input = tf.reshape(self.x, self.data_shape) # resize the input image to the size required by the network image_resize = tf.image.resize_images(image_shaped_input, self.data_resize) tf.summary.image('input', image_resize, self.n_label) self.keep_prob_dropout = cfg.TRAIN.KEEP_PROB_DROPOUT self.keep_prob = tf.placeholder(tf.float32) # Get the return result of the last level of lenet 5 self.result_info = self.model_net.lenet_5(image_resize, n_label=self.n_label, keep_prob=self.keep_prob_dropout) # Calculated loss with tf.name_scope(name="train_loss"): # Define loss function self.cross_entropy = tf.reduce_mean(-tf.reduce_sum(self.y * tf.log(self.result_info), reduction_indices=[1])) tf.summary.scalar("train_loss", self.cross_entropy) pass with tf.name_scope(name="optimizer"): self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate) self.train_op = self.optimizer.minimize(self.cross_entropy) pass with tf.name_scope(name="accuracy"): self.correct_pred = tf.equal(tf.argmax(self.result_info, 1), tf.argmax(self.y, 1)) self.acc = tf.reduce_mean(tf.cast(self.correct_pred, tf.float32)) tf.summary.scalar("accuracy", self.acc) pass # Because we have defined too many tf.summary summary operations before, it is too troublesome to perform them one by one, # Get all the summary operations directly using TF. Summary. Merge all() for later execution self.merged = tf.summary.merge_all() self.sess = tf.InteractiveSession() # Save training model self.saver = tf.train.Saver() # Define two tf.summary.FileWriter file recorders and different subdirectories to store the training and test log data respectively # At the same time, the Session calculation graph sess.graph is added to the training process so that it can be displayed in the graphics window of TensorBoard self.train_writer = tf.summary.FileWriter(self.log_path + 'train', self.sess.graph) self.test_writer = tf.summary.FileWriter(self.log_path + 'test') pass # Irrigation data def feed_dict(self, train_flag=True): # training sample if train_flag: # Get next batch of samples x_data, y_data = self.mnist_data.train.next_batch(self.batch_size) keep_prob = self.keep_prob_dropout pass # Validation sample else: x_data, y_data = self.mnist_data.test.images, self.mnist_data.test.labels keep_prob = 1.0 pass return {self.x: x_data, self.y: y_data, self.keep_prob: keep_prob} pass def do_train(self): # Define initialization init = tf.global_variables_initializer() self.sess.run(init) test_acc = None for epoch in range(self.n_epoch): # Obtain the total number of samples batch_number = self.mnist_data.train.num_examples # Obtain the total samples in several batches size_number = int(batch_number / self.batch_size) for number in range(size_number): summary, _ = self.sess.run([self.merged, self.train_op], feed_dict=self.feed_dict()) # Cycle number i = epoch * size_number + number + 1 self.train_writer.add_summary(summary, i) if number == size_number - 1: # Get next batch of samples x_batch, y_batch = self.mnist_data.train.next_batch(self.batch_size) acc_train = self.acc.eval(feed_dict={self.x: x_batch, self.y: y_batch}) print("acc_train: {}".format(acc_train)) # Verification method two or two, any one can be chosen. test_summary, acc_test = self.sess.run([self.merged, self.acc], feed_dict=self.feed_dict(False)) print("epoch: {}, acc_test: {}".format(epoch + 1, acc_test)) self.test_writer.add_summary(test_summary, epoch + 1) test_acc = acc_test pass save_path = self.model_save_path + "acc={:.6f}".format(test_acc) + ".ckpt" # Preservation model self.saver.save(self.sess, save_path, global_step=self.n_epoch) self.train_writer.close() self.test_writer.close() pass if __name__ == "__main__": # Code start time start_time = datetime.now() print("start time: {}".format(start_time)) demo = CnnMnistTrain() demo.do_train() # Code end time end_time = datetime.now() print("End time: {}, Training model time consuming: {}".format(end_time, end_time - start_time))
Training code, just rough to training, not to do grid search, not to do fine tuning. There is no early stopping. If you are interested, you can add it yourself.
(9) . test code
#!/usr/bin/env python # _*_ coding:utf-8 _*_ # ============================================ # @Time : 2020/02/08 19:24 # @Author : WanDaoYi # @FileName : cnn_mnist_test.py # ============================================ from datetime import datetime import tensorflow as tf import numpy as np from config import cfg from core.common import Common from core.model_net import ModelNet class CnnMnistTest(object): def __init__(self): self.common = Common() self.model_net = ModelNet() # Read data and dimensions self.mnist_data, self.n_feature, self.n_label = self.common.read_data() # ckpt model self.test_ckpt_model = cfg.TEST.CKPT_MODEL_SAVE_PATH print("test_ckpt_model: {}".format(self.test_ckpt_model)) # tf.reset_default_graph() # Create a blueprint with tf.name_scope(name="input"): self.x = tf.placeholder(dtype=tf.float32, shape=(None, self.n_feature), name="input_data") self.y = tf.placeholder(dtype=tf.float32, shape=(None, self.n_label), name="input_labels") self.data_shape = cfg.COMMON.DATA_RESHAPE self.data_resize = cfg.COMMON.DATA_RESIZE with tf.name_scope(name="input_shape"): # 784 dimensions are transformed into pictures and kept to nodes # -1 represents the number of incoming pictures, 28 x 28 is the height and width of the picture, 1 is the color channel of the picture self.image_shaped_input = tf.reshape(self.x, self.data_shape) # resize the input image to the size 32 x 32 required by the network self.image_resize = tf.image.resize_images(self.image_shaped_input, self.data_resize) # Get the return result of the last level of lenet 5 self.result_info = self.model_net.lenet_5(self.image_resize, n_label=self.n_label) pass # Forecast def do_ckpt_test(self): saver = tf.train.Saver() with tf.Session() as sess: saver.restore(sess, self.test_ckpt_model) # Forecast output = self.result_info.eval(feed_dict={self.x: self.mnist_data.test.images}) # Convert one hot forecast to number y_perd = np.argmax(output, axis=1) print("predicted value: {}".format(y_perd[: 5])) # True value y_true = np.argmax(self.mnist_data.test.labels, axis=1) print("True value: {}".format(y_true[: 5])) pass pass if __name__ == "__main__": # Code start time start_time = datetime.now() print("start time: {}".format(start_time)) demo = CnnMnistTest() # Test with ckpt model demo.do_ckpt_test() # Code end time end_time = datetime.now() print("End time: {}, Training model time consuming: {}".format(end_time, end_time - start_time))
(10) . view log effect
Image of acc:
graphs image:
In the tensorboard log, you can double-click lenet-5. The model structure is as follows:
After opening the log graphs, you can zoom in to see a clear image
From the training of DNN and CNN, it is not difficult to see that CNN is better than DNN in image information prediction (the accuracy of 10 epochs of DNN is only 96%, while the accuracy of 10 epochs of CNN is 98%. Of course, this is only the training effect in the early stage, which is not good to explain directly. However, if the training times are enough, the effect of CNN will be a little better. That's why, in image processing, CNN is mostly used instead of pure DNN)
Back to CNN convolutional neural network directory
The last chapter: Depth part -- CNN convolutional neural network (3) On ROI pooling and ROI Align and interpolation