I Overview of transfer learning
Transfer learning refers to applying a network that completes learning tasks in a certain field to a new field for learning tasks.
Implementation of transfer learning based on convolution network
For the trained network model, the first few layers of the network model usually learn general features. With the deepening of the network layer, the deeper network layer focuses more on learning specific features, so the general features can be transferred to other fields. For the deep convolution network, the pre trained network model can realize the separation of network structure and parameter information. On the premise of ensuring the consistency of network structure, the pre trained weight parameters can be used to initialize the new network, which can greatly reduce the training time.
The specific steps are as follows:
① Select the trained network model based on ImageNet
② Replace the full connection layer of the selected network model with the self built full connection layer to obtain a new network model.
③ The parameters of the required number of layers are fixed in the new network model, and the remaining parameters are trained under new small samples.
④ The new network model has been trained
This method is also called fine tuning. The training in the above steps avoids the link of network model training the network from scratch for new tasks, which can save time. The network model trained based on ImageNet has strong pan China ability, which virtually expands the training data, so that the new network model improves the training accuracy, better generalization ability and better robustness.
II Implementation of migration learning with inception V3
The inception V3 structure evolved from the inception structure in Google net. Compared with the traditional inception structure, inception V3 has the following improvements:
① The large convolution kernel is decomposed into small convolution kernels. For example, two 3 * 3 convolution kernels are used to replace the 5 * 5 convolution kernel, and the number of convolution operations is reduced.
② Adding BN layer to the auxiliary classifier helps to improve the accuracy and play the effect of regularization.
③ Set the stripe in the last layer of the inception block to 2 to realize the attenuation of feature map size.
Inception V3 will realize migration learning based on fine-tuning. After obtaining the inception V3 model pre trained based on ImageNet, use the self built full connection layer (including output layer) to replace the full connection layer and output layer of inception V3 model to obtain a new network model, and then fix some parameters of the new network model so that it does not participate in training, The remaining unfixed parameters are trained based on mnist data set.
Code implementation: Based on keras framework
from keras.datasets import mnist from keras.applications.inception_v3 import InceptionV3 from keras.utils import np_utils from keras.models import Model from keras.layers import Dense,Dropout,Conv2D,MaxPooling2D,GlobalAveragePooling2D,Input,UpSampling3D from matplotlib import pyplot as plt import numpy as np (X_train,Y_train),(X_test,Y_test)=mnist.load_data() X_test1=X_test Y_test1=Y_test X_train=X_train.reshape(-1,28,28,1).astype("float32")/255.0 X_test=X_test.reshape(-1,28,28,1).astype("float32")/255.0 Y_test=np_utils.to_categorical(Y_test,10) Y_train=np_utils.to_categorical(Y_train,10) print(X_train.shape) print(Y_train.shape) #use keras Medium function api I.e. building a new network model base_model #weight="imagenet",xcception Weight usage based on imagenet Weight gained from training, include_to=false Represents a fully connected layer that does not contain the top layer base_model=InceptionV3(weights="imagenet",include_top=False) input_inception=Input(shape=(28,28,1),dtype="float32",name="inceptionv3") #Upsampling input data x=UpSampling3D(size=(3,3,3),data_format="channels_last")(input_inception) #Send data to model Model x=base_model(x) #At this time, the model does not have a full connection layer, so you need to build a full connection layer yourself #adopt GlobalAveragePooling2D Global average pooling is performed for each two-dimensional feature map, and the corresponding one-dimensional value is output x=GlobalAveragePooling2D()(x) #Build a full connection layer x=Dense(1024,activation="relu")(x) x=Dropout(0.5)(x) pre=Dense(10,activation="softmax")(x) #call model Model, defining a new model inceptionv3 Input layer, output layer inceptionv3_model=Model(inputs=input_inception,outputs=pre) #Print base_model Name and corresponding number of layers of the model for i,layer in enumerate(base_model.layers): print(i,layer.name) #fixed base_model The parameters of the first 64 layers make it not participate in training for layer in base_model.layers[:64]: layer.trainable=False #View network model summary inceptionv3_model.summary() #compile inceptionv3_model.compile( loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"] ) #train n_epoch=5 n_batches=64 training=inceptionv3_model.fit( X_train, Y_train, epochs=n_epoch, batch_size=n_batches, validation_split=0.2, verbose=1 ) #Model evaluation test=inceptionv3_model.evaluate(X_test,Y_test,verbose=1) print("Error:",test[0]) print("Accuracy:",test[1]) #Draw inceptionv3_model Time varying model epochs Change curve def plot_history(training_history,train,validation): plt.plot(training.history[train],linestyle="--",color="b") plt.plot(training.history[validation],linestyle="-",color="r") plt.title("training history") plt.xlabel("epochs") plt.ylabel("accuracy") plt.legend(["train","validation"],loc="lower right") plt.show() plot_history(training,"accuracy","val_accuracy") def plot_history1(training_history,train,validation): plt.plot(training.history[train],linestyle="--",color="b") plt.plot(training.history[validation],linestyle="-",color="r") plt.title("training history") plt.xlabel("epochs") plt.ylabel("loss") plt.legend(["train","validation"],loc="upper right") plt.show() plot_history1(training,"loss","val_loss") #forecast prediction=inceptionv3_model.predict(X_test) #Print original image def plot_image(image): fig=plt.gcf() fig.set_size_inches(2,2) plt.imshow(image,cmap="binary") plt.show() def pre_result(i): plot_image(X_test1[i]) print("True value:",Y_test[i]) print("Predicted value:",np.argmax(prediction[i])) pre_result(0) pre_result(1) pre_result(2)
Overview of Xception
Xception is a network structure in which inception is under extreme assumptions. When the convolution layer attempts to convolute in three-dimensional space (two spatial dimensions and one channel dimension), a convolution kernel needs to draw cross-channel correlation and spatial correlation at the same time.
The idea of the inception module shared earlier is to decompose this convolution process into a series of independent operations to make it more convenient and effective. The typical inception module assumes that the rendering of channel correlation and spatial correlation is effectively decoupled, while the idea of Xception is an extreme case of the idea of inception module, that is, the rendering of cross-channel correlation and spatial correlation in the characteristic graph of convolutional neural network can be completely decoupled.
Xception's migration learning is also based on fine-tuning. Like the migration learning realized by inception V3, after obtaining the xception model pre trained based on imageNet, use the self built full connection layer (including output layer) to replace the full connection layer and output layer of xception model, so as to obtain a new network model and fix some parameters of the new network model, Make it not participate in the training, and train the remaining unfixed parameters based on mnist data set.
Xception realizes transfer learning
Code implementation:
from keras.applications.xception import Xception from keras.datasets import mnist from keras.utils import np_utils from keras.layers import Dense,GlobalAveragePooling2D,Dropout,Input,UpSampling3D from keras.models import Model from matplotlib import pyplot as plt import numpy as np (X_train,Y_train),(X_test,Y_test)=mnist.load_data() X_test1=X_test Y_test1=Y_test X_train=X_train.reshape(-1,28,28,1).astype("float32")/255.0 X_test=X_test.reshape(-1,28,28,1).astype("float32")/255.0 Y_test=np_utils.to_categorical(Y_test,10) Y_train=np_utils.to_categorical(Y_train,10) #build xception Model #weight="imagenet",xcception Weight usage based on imagenet Weight gained from training, include_to=false Represents a fully connected layer that does not contain the top layer base_model=Xception(weights="imagenet",include_top=False) input_xception=Input(shape=(28,28,1),dtype="float32",name="xception imput") #The data is up sampled and repeated along the three dimensions of the data size[0],size[1],size[2] x=UpSampling3D(size=(3,3,3),data_format="channels_last")(input_xception) #Send data to the network x=base_model(x) #At this time, the model does not have a full connection layer, so you need to build a full connection layer yourself #adopt GlobalAveragePooling2D Global average pooling is performed for each two-dimensional feature map, and the corresponding one-dimensional value is output x=GlobalAveragePooling2D()(x) x=Dense(1024,activation="relu")(x) x=Dropout(0.5)(x) pre=Dense(10,activation="softmax")(x) #call Model,Define a new model Xception_model xception_model=Model(inputs=input_xception,outputs=pre) #View the name of each layer and the corresponding number of layers for i,layer in enumerate(base_model.layers): print(i,layer.name) #fixed base_model The parameters of the first 36 layers make it not participate in the training for layer in base_model.layers[:36]: layer.trainable=False #View a summary of the model xception_model.summary() #compile xception_model.compile( loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"] ) #train training=xception_model.fit( X_train, Y_train, epochs=5, batch_size=64, validation_split=0.2, verbose=1 ) test=xception_model.evaluate(X_test,Y_test) print("Error:",test[0]) print("Exact value:",test[1]) #Draw the change curve of training set and verification set with time def plot_history(training_history,train,validation): plt.plot(training.history[train],linestyle="-",color="b") plt.plot(training.history[validation],linestyle="--",color="r") plt.title("xception_model accuracy") plt.xlabel("epochs") plt.ylabel("accuracy") plt.legend(["train","validation"],loc="lower right") plt.show() plot_history(training,"accuracy","val_accuracy") def plot_history1(training_history,train,validation): plt.plot(training.history[train],linestyle="-",color="b") plt.plot(training.history[validation],linestyle="--",color="r") plt.title("xception_model accuracy") plt.xlabel("epochs") plt.ylabel("loss") plt.legend(["train","validation"],loc="upper right") plt.show() plot_history1(training,"loss","val_loss") #Estimate prediction=xception_model.predict(X_test) #print pictures def plot_image(image): fig=plt.gcf() fig.set_size_inches(2,2) plt.imshow(image,cmap="binary") plt.show() def result(i): plot_image(X_test1[i]) print("True value:",Y_test1[i]) print("Predicted value:",np.argmax(prediction[i])) result(0) result(1)
--------
Copyright notice: This is the original article of CSDN blogger "endless silence", which follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this notice for reprint.
Original link: https://blog.csdn.net/hgnuxc_1993/article/details/116020911