Ptorch cifar10 image classification LeNet5

Ptorch cifar10 image classification LeNet5


Here is a summary: Summary

4. Define network (LeNet5)

Handwritten font recognition model LeNet5 was born in 1994 and is one of the earliest convolutional neural networks. Through ingenious design, LeNet5 uses convolution, parameter sharing, pooling and other operations to extract features, avoiding a large amount of computing cost. Finally, it uses fully connected neural network for classification and recognition. This network is also the starting point of a large number of neural network architectures recently.

Some properties of LeNet-5:

  • If the input layer does not count the layers of the neural network, LeNet-5 is a 7-layer network. (convolution and pooling may also be regarded as a layer in some places) (the "5" in the name of LeNet-5 can also be understood as the number of layers with trainable parameters in the whole network is 5.)

  • LeNet-5 has approximately 60000 parameters.

  • As the network gets deeper and deeper, the height and width of the image are shrinking. At the same time, the number of channel s of the image has been increasing.

  • The current commonly used LeNet-5 structure is different from the structure proposed by Professor Yann LeCun in his 1998 paper, such as the use of activation function. Now, ReLU is generally used as the activation function, and softmax is generally selected for the output layer.

First of all, we still have to judge whether we can use GPU, because the speed of GPU may be about 20-50 times faster than that of CPU, especially for convolutional neural network.

device = 'cuda' if torch.cuda.is_available() else 'cpu'
#Define network
class LeNet5(nn.Module):# nn.Module is the base class of all neural networks. Any neural network defined by ourselves should inherit NN Module
    def __init__(self):
        super(LeNet5,self).__init__()
        self.conv1 = nn.Sequential(
            # Convolution layer 1, 3-channel input, 6 convolution kernels, kernel size 5 * 5
            # After this layer, the image size becomes 32-5 + 1, 28 * 28
            nn.Conv2d(in_channels=3,out_channels=6,kernel_size=5,stride=1, padding=0),
            #Activation function
            nn.ReLU(),
            # After 2 * 2 maximum pooling, the image becomes 14 * 14
            nn.MaxPool2d(kernel_size=2,stride=2,padding=0),
        )
        self.conv2 = nn.Sequential(
            # Convolution layer 2, 6 input channels, 16 convolution kernels, kernel size 5 * 5
            # After this layer, the image becomes 14-5 + 1, 10 * 10
            nn.Conv2d(in_channels=6,out_channels=16,kernel_size=5,stride=1, padding=0),
            nn.ReLU(),
            # After 2 * 2 maximum pooling, the image becomes 5 * 5
            nn.MaxPool2d(kernel_size=2,stride=2,padding=0),
        )
        self.fc = nn.Sequential(
            # Then there are three full connection layers
            nn.Linear(16*5*5,120),
            nn.ReLU(),
            nn.Linear(120,84),
            nn.ReLU(),
            nn.Linear(84,10),
        )
        
        # Define the forward propagation process, and enter
    def forward(self,x):
        x = self.conv1(x)
        x = self.conv2(x)
        # nn. The input and output of linear () are values with one dimension, so the multi-dimensional tensor should be flattened into one dimension
            
        x = x.view(x.size()[0],-1)
        x = self.fc(x)
        return x
            
net = LeNet5().cuda()
summary(net,(3,32,32))
----------------------------------------------------------------
     Layer (type)               Output Shape         Param #
================================================================
         Conv2d-1            [-1, 6, 28, 28]             456
           ReLU-2            [-1, 6, 28, 28]               0
      MaxPool2d-3            [-1, 6, 14, 14]               0
         Conv2d-4           [-1, 16, 10, 10]           2,416
           ReLU-5           [-1, 16, 10, 10]               0
      MaxPool2d-6             [-1, 16, 5, 5]               0
         Linear-7                  [-1, 120]          48,120
           ReLU-8                  [-1, 120]               0
         Linear-9                   [-1, 84]          10,164
          ReLU-10                   [-1, 84]               0
        Linear-11                   [-1, 10]             850
================================================================
Total params: 62,006
Trainable params: 62,006
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 0.11
Params size (MB): 0.24
Estimated Total Size (MB): 0.36
----------------------------------------------------------------

Firstly, from our summary, we can see that the parameters of the model we defined are about 62 thousand. We input the tensor of (batch, 3, 32, 32), and here we can also see the change of our image output size after each layer. Finally, we output 10 parameters, and then we can get the probability of each category through the softmax function.

We can also print out our model and observe it

LeNet5(
  (conv1): Sequential(
    (0): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (conv2): Sequential(
    (0): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fc): Sequential(
    (0): Linear(in_features=400, out_features=120, bias=True)
    (1): ReLU()
    (2): Linear(in_features=120, out_features=84, bias=True)
    (3): ReLU()
    (4): Linear(in_features=84, out_features=10, bias=True)
  )
)

If your computer has multiple GPUs, this code can use GPUs for parallel computing to speed up the computing speed

net =LeNet5().to(device)
if device == 'cuda':
    net = nn.DataParallel(net)
    # When the calculation diagram does not change (the shape of each input is the same, and the model does not change), the performance can be improved, otherwise, the performance will be reduced
    torch.backends.cudnn.benchmark = True

5. Define loss function and optimizer

pytorch encapsulates all the optimization methods commonly used in deep learning in torch In Optim, all optimization methods inherit the base class optim Optimizier
The loss function is encapsulated in the neural network toolbox nn, which contains many loss functions

Here I use the SGD + momentum algorithm, and our loss function is defined as the cross entropy function. In addition, the learning strategy is defined as dynamically updating the learning rate. If the training loss does not decrease after 5 iterations, we will change the learning rate, which will become 0.5 times the original, and the minimum will be reduced to 0.00001

If you want to know more about the optimizer and learning rate strategy, you can refer to the following resources

It is decided to iterate 20 times

import torch.optim as optim
optimizer = optim.SGD(net.parameters(), lr=1e-1, momentum=0.9, weight_decay=5e-4)
criterion = nn.CrossEntropyLoss()
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.5 ,patience = 5,min_lr = 0.000001) # Dynamic update learning rate
# scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones=[75, 150], gamma=0.5)
import time
epoch = 20

6. Training

First, define the location where the model is saved

import os
if not os.path.exists('./model'):
    os.makedirs('./model')
else:
    print('file already exist')
save_path = './model/LeNet5.pth'

I define a train function, perform a training in the train function, and save our trained model

from utils import trainfrom utils import plot_historyAcc, Loss, Lr = train(net, trainloader, testloader, epoch, optimizer, criterion, scheduler, save_path, verbose = True)
Epoch [  1/ 20]  Train Loss:1.918313  Train Acc:28.19% Test Loss:1.647650  Test Acc:39.23%  Learning Rate:0.100000	Time 00:29Epoch [  2/ 20]  Train Loss:1.574629  Train Acc:42.00% Test Loss:1.521759  Test Acc:44.59%  Learning Rate:0.100000	Time 00:21Epoch [  3/ 20]  Train Loss:1.476749  Train Acc:46.23% Test Loss:1.438898  Test Acc:48.56%  Learning Rate:0.100000	Time 00:21Epoch [  4/ 20]  Train Loss:1.415963  Train Acc:49.10% Test Loss:1.378597  Test Acc:50.25%  Learning Rate:0.100000	Time 00:21Epoch [  5/ 20]  Train Loss:1.384430  Train Acc:50.69% Test Loss:1.350369  Test Acc:51.81%  Learning Rate:0.100000	Time 00:20Epoch [  6/ 20]  Train Loss:1.331118  Train Acc:52.49% Test Loss:1.384710  Test Acc:49.62%  Learning Rate:0.100000	Time 00:20Epoch [  7/ 20]  Train Loss:1.323763  Train Acc:53.08% Test Loss:1.348911  Test Acc:52.59%  Learning Rate:0.100000	Time 00:21Epoch [  8/ 20]  Train Loss:1.291410  Train Acc:54.19% Test Loss:1.273263  Test Acc:55.00%  Learning Rate:0.100000	Time 00:20Epoch [  9/ 20]  Train Loss:1.261590  Train Acc:55.36% Test Loss:1.295092  Test Acc:54.50%  Learning Rate:0.100000	Time 00:20Epoch [ 10/ 20]  Train Loss:1.239585  Train Acc:56.45% Test Loss:1.349028  Test Acc:52.57%  Learning Rate:0.100000	Time 00:21Epoch [ 11/ 20]  Train Loss:1.225227  Train Acc:56.81% Test Loss:1.293521  Test Acc:53.87%  Learning Rate:0.100000	Time 00:22Epoch [ 12/ 20]  Train Loss:1.221355  Train Acc:56.86% Test Loss:1.255155  Test Acc:56.13%  Learning Rate:0.100000	Time 00:21Epoch [ 13/ 20]  Train Loss:1.207748  Train Acc:57.59% Test Loss:1.238375  Test Acc:57.10%  Learning Rate:0.100000	Time 00:21Epoch [ 14/ 20]  Train Loss:1.195587  Train Acc:58.01% Test Loss:1.185524  Test Acc:58.56%  Learning Rate:0.100000	Time 00:20Epoch [ 15/ 20]  Train Loss:1.183456  Train Acc:58.50% Test Loss:1.192770  Test Acc:58.04%  Learning Rate:0.100000	Time 00:21Epoch [ 16/ 20]  Train Loss:1.168697  Train Acc:59.13% Test Loss:1.272912  Test Acc:55.85%  Learning Rate:0.100000	Time 00:20Epoch [ 17/ 20]  Train Loss:1.167685  Train Acc:59.23% Test Loss:1.195087  Test Acc:58.33%  Learning Rate:0.100000	Time 00:20Epoch [ 18/ 20]  Train Loss:1.162324  Train Acc:59.37% Test Loss:1.242964  Test Acc:56.62%  Learning Rate:0.100000	Time 00:21Epoch [ 19/ 20]  Train Loss:1.154494  Train Acc:59.72% Test Loss:1.274993  Test Acc:54.90%  Learning Rate:0.100000	Time 00:21Epoch [ 20/ 20]  Train Loss:1.163650  Train Acc:59.45% Test Loss:1.182077  Test Acc:59.00%  Learning Rate:0.100000	Time 00:20

Then you can print the loss function curve, accuracy curve and learning rate curve respectively

plot_history(epoch ,Acc, Loss, Lr)

Loss function curve

Accuracy curve

Learning rate curve

7. Test

View accuracy

correct = 0   # Define the number of pictures predicted correctly and initialize to 0
total = 0     # The total number of pictures participating in the test is also initialized to 0
# testloader = torch.utils.data.DataLoader(testset, batch_size=32,shuffle=True, num_workers=2)
for data in testloader:  # Cycle each batch
    images, labels = data
    images = images.to(device)
    labels = labels.to(device)
    net.eval()  # Change the model to test mode
    if hasattr(torch.cuda, 'empty_cache'):
        torch.cuda.empty_cache()
    outputs = net(images)  # Input network for testing
    
    # outputs.data is a 4x10 tensor. The value and sequence number of the largest column in each row form a one-dimensional tensor. The first is the tensor of value and the second is the tensor of sequence number.
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)          # Update the number of test pictures
    correct += (predicted == labels).sum() # Update the number of correctly classified pictures

print('Accuracy of the network on the 10000 test images: %.2f %%' % (100 * correct / total))
 
Accuracy of the network on the 10000 test images: 59.31 %

It can be seen that the accuracy of the self-defined network model in the test set is 59.31%

Torch in the program Max (outputs. Data, 1), returns a tuple

It is obvious here that the first element of the returned tuple is image data, that is, the maximum value, and the second element is label, that is, the index of the maximum value! We only need label (the index of the maximum value), so there will be an assignment statement such as, predicted, which means to ignore the first return value and assign it to, which means to discard it;

Check the accuracy of each category

 # Define 2 lists that store the correct number of tests in each class, and initialize to 0
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
# testloader = torch.utils.data.DataLoader(testset, batch_size=64,shuffle=True, num_workers=2)
net.eval()
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        if hasattr(torch.cuda, 'empty_cache'):
            torch.cuda.empty_cache()
        outputs = net(images)

        _, predicted = torch.max(outputs.data, 1)
    #In 4 groups (batch_size) of data, if the output is the same as the label, it is marked as 1, otherwise it is 0
        c = (predicted == labels).squeeze()
        for i in range(len(images)):      # Because each batch has 4 pictures, it also needs a small loop of 4
            label = labels[i]   # The of each class is accumulated separately
            class_correct[label] += c[i]
            class_total[label] += 1
 
 
for i in range(10):
    print('Accuracy of %5s : %.2f %%' % (classes[i], 100 * class_correct[i] / class_total[i]))
Accuracy of airplane : 70.80 %
Accuracy of automobile : 68.30 %
Accuracy of  bird : 36.70 %
Accuracy of   cat : 35.80 %
Accuracy of  deer : 56.90 %
Accuracy of   dog : 40.00 %
Accuracy of  frog : 79.70 %
Accuracy of horse : 66.30 %
Accuracy of  ship : 58.10 %
Accuracy of truck : 80.10 %

Sample test and visualize some results

dataiter = iter(testloader)
images, labels = dataiter.next()
images_ = images
#images_ = images_.view(images.shape[0], -1)
images_ = images_.to(device)
labels = labels.to(device)
val_output = net(images_)
_, val_preds = torch.max(val_output, 1)

fig = plt.figure(figsize=(25,4))

correct = torch.sum(val_preds == labels.data).item()

val_preds = val_preds.cpu()
labels = labels.cpu()

print("Accuracy Rate = {}%".format(correct/len(images) * 100))

fig = plt.figure(figsize=(25,25))
for idx in np.arange(64):    
    ax = fig.add_subplot(8, 8, idx+1, xticks=[], yticks=[])
    #fig.tight_layout()
#     plt.imshow(im_convert(images[idx]))
    imshow(images[idx])
    ax.set_title("{}, ({})".format(classes[val_preds[idx].item()], classes[labels[idx].item()]), 
                 color = ("green" if val_preds[idx].item()==labels[idx].item() else "red"))
Accuracy Rate = 58.59375%
<Figure size 1800x288 with 0 Axes>

8. Save model

torch.save(net,save_path[:-4]+'_'+str(epoch)+'.pth')
# torch.save(net, './model/MyNet.pth')

9. Forecast

Read local pictures for prediction

import torch
from PIL import Image
from torch.autograd import Variable
import torch.nn.functional as F
from torchvision import datasets, transforms
import numpy as np
 
classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = Mynet()

model = torch.load(save_path)  # Loading model
# model = model.to('cuda')
model.eval()  # Change the model to test mode

# Read pictures to predict
img = Image.open("./airplane.jpg").convert('RGB') # Read image
img

Then we will predict the images, but here is a point. We need to transform our images, because we also transform each image during training, so we need to keep consistent

trans = transforms.Compose([transforms.Scale((32,32)),
                            transforms.ToTensor(),
                            transforms.Normalize(mean=(0.5, 0.5, 0.5), 
                                                 std=(0.5, 0.5, 0.5)),
                           ])
 
img = trans(img)
img = img.to(device)
# The picture is expanded to one dimension, because the input into the saved model is 4-dimensional [batch_size, channel, length, width], while the ordinary picture is only three-dimensional, [channel, length, width]
img = img.unsqueeze(0)  
    # After expansion, it is [1, 3, 32, 32]
output = model(img)
prob = F.softmax(output,dim=1) #prob is the probability of 10 classifications
print("probability",prob)
value, predicted = torch.max(output.data, 1)
print("category",predicted.item())
print(value)
pred_class = classes[predicted.item()]
print("classification",pred_class)
probability tensor([[8.6320e-01, 1.7801e-04, 3.6034e-03, 3.1733e-04, 2.0015e-03, 6.3802e-05,
      6.5461e-05, 2.0457e-04, 1.2603e-01, 4.3343e-03]], device='cuda:0',
    grad_fn=<SoftmaxBackward>)
Category 0
tensor([6.2719], device='cuda:0')
classification plane

Here we can see that our final result is classified as plane, and our confidence rate is about 86.32%

Read picture address for prediction

We can also predict by reading the url address of the picture. Here I found several different pictures to predict

import requests
from PIL import Image
url = 'https://dss2.bdstatic.com/70cFvnSh_Q1YnxGkpoWK1HF6hhy/it/u=947072664,3925280208&fm=26&gp=0.jpg'
url = 'https://ss0.bdstatic.com/70cFuHSh_Q1YnxGkpoWK1HF6hhy/it/u=2952045457,215279295&fm=26&gp=0.jpg'
url = 'https://ss0.bdstatic.com/70cFvHSh_Q1YnxGkpoWK1HF6hhy/it/u=2838383012,1815030248&fm=26&gp=0.jpg'
url = 'https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fwww.goupuzi.com%2Fnewatt%2FMon_1809%2F1_179223_7463b117c8a2c76.jpg&refer=http%3A%2F%2Fwww.goupuzi.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1624346733&t=36ba18326a1e010737f530976201326d'
url = 'https://ss3.bdstatic.com/70cFv8Sh_Q1YnxGkpoWK1HF6hhy/it/u=2799543344,3604342295&fm=224&gp=0.jpg'
# url = 'https://ss1.bdstatic.com/70cFuXSh_Q1YnxGkpoWK1HF6hhy/it/u=2032505694,2851387785&fm=26&gp=0.jpg'
response = requests.get(url, stream=True)
print (response)
img = Image.open(response.raw)
img

It's the same here as before

trans = transforms.Compose([transforms.Scale((32,32)),
                            transforms.ToTensor(),
                            transforms.Normalize(mean=(0.5, 0.5, 0.5), 
                                                 std=(0.5, 0.5, 0.5)),
                           ])
 
img = trans(img)
img = img.to(device)
# The picture is expanded to one dimension, because the input into the saved model is 4-dimensional [batch_size, channel, length, width], while the ordinary picture is only three-dimensional, [channel, length, width]
img = img.unsqueeze(0)  
    # After expansion, it is [1, 3, 32, 32]
output = model(img)
prob = F.softmax(output,dim=1) #prob is the probability of 10 classifications
print("probability",prob)
value, predicted = torch.max(output.data, 1)
print("category",predicted.item())
print(value)
pred_class = classes[predicted.item()]
print("classification",pred_class)
probability tensor([[0.7491, 0.0015, 0.0962, 0.0186, 0.0605, 0.0155, 0.0025, 0.0150, 0.0380,
      0.0031]], device='cuda:0', grad_fn=<SoftmaxBackward>)
Category 0
tensor([3.6723], device='cuda:0')
classification plane

It can be seen that the result of our classification is plane, while the real classification is cat, so this model is not particularly perfect and needs to be improved

10. Summary

In fact, for our LeNet5, this model first appeared on the MINST data set, achieved high accuracy, and first used the number recognition on stamps. It can be seen from the experimental results that he does not perform well in the classification of CIFAR10, so if you want to use LeNet5 for image recognition, you should think more about it

Incidentally, our data and codes are described in my summary. If necessary, we can get them ourselves

Here is another summary: Summary

Keywords: Pytorch Computer Vision Convolutional Neural Networks

Added by bongbong on Sun, 02 Jan 2022 23:59:21 +0200