[PyTorch] build and train a neural network model from scratch (image classification, CNN)

0. Preface

Some powerful models have been used before, such as VGG16 in the field of image classification, YoloV5 in the field of target detection, Yolact in the field of instance segmentation, etc. But only after the environment is configured for training, the interface of the source code can be modified slightly to meet their own needs. PyTorch has never been used to build and train a model from scratch.

Just recently, I'm learning PyTorch systematically. I'll summarize how to build and train a neural network model from scratch.

1. Use torchvision to load the dataset and perform preprocessing

The dataset we use is CIFAR10, which has 10 categories and the image size is 3 x 32 x 32, as shown below:

The code is as follows, and there are notes in the code in important places

import torch
import cv2
import numpy as np
from torchvision import datasets, transforms
import torchvision

# 1. Use torchvision to load the dataset and perform preprocessing
transform = transforms.Compose([transforms.ToTensor(),  # Convert image to tensor
                                transforms.Normalize((0.5,0.5,0.5), (0.5,0.5,0.5)),  # Normalization processing
                              ])
# Load training and test sets
trainset = torchvision.datasets.CIFAR10(root='E:\\Machine Learning\\PyTorch\\CIFAR10', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='E:\\Machine Learning\\PyTorch\\CIFAR10', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

2. Define (build) your own neural network

The code and notes are as follows. The whole structure is very simple, that is, two volume layers, two maximum pool layers, and finally connect three full connection layers.

# 2. Define convolutional neural network
import torch.nn as nn
import torch.nn.functional as F

class MyModel(nn.Module):  #  Inherit nn.Module
    # Define network structure
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = nn.Conv2d(3,6,5)
        self.pool = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(6,16,5)
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
    # Define forward propagation process
    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

        return x

Net = MyModel()

3. Define Loss Function and Optimizer

The back propagation of neural network requires a loss function. Because it is a multi classification problem, we use the cross entropy loss function:

# 3. Define loss function and optimizer
import torch.optim as optim

criterion = nn.CrossEntropyLoss()  # Multi classification problem, using cross entropy loss function
optimizer = optim.SGD(Net.parameters(), lr=0.001, momentum=0.9)  # Using SGD optimizer

4. Training neural network

Then we began to train our model, training 2 epoch and batch in total_ Size = 4, first train with CPU and see how the time is.

# 4. Training neural network
epochs = 2  # Train two epoch, batch_ Size = 4 (the size of batch_size is defined in torch.utils.data.DataLoader in the first step)
e1 = cv2.getTickCount()  # Record training time
for epoch in range(epochs):
    total_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # Get inputs
        inputs, labels = data
        optimizer.zero_grad()
        # forward + backward + optimize
        # Forward propagation + back propagation + update parameters
        outputs = Net(inputs)  # Forward propagation to obtain outputs
        loss = criterion(outputs, labels)  # The loss function is obtained
        loss.backward()  # Backward propagation
        optimizer.step()  # Update parameters

        # Output training process
        total_loss += loss.item()
        if (i+1) % 1000 == 0:  # loss is output every 1000 times (4000 images)
            print('The first{}individual epoch: The first{:5d}Times: current training loss loss Is:{:.3f}'.format(epoch+1, i+1, total_loss/1000))
            total_loss = 0.0

e2 = cv2.getTickCount()
print('use CPU Total training time:{} s'.format((e2-e1)/cv2.getTickFrequency()))
# Output results:
1st epoch: 1000th time: current training loss loss Is: 2.294
 1st epoch: 2000: current training loss loss Is: 2.087
 1st epoch: 3000: current training loss loss Is: 1.902
 1st epoch: 4000: current training loss loss Is: 1.799
 1st epoch: Session 5000: current training loss loss Is: 1.709
 1st epoch: 6000: current training loss loss Is: 1.660
 1st epoch: 7000: current training loss loss Is: 1.623
 1st epoch: 8000: current training loss loss Is: 1.586
 1st epoch: 9000: current training loss loss Is: 1.550
 1st epoch: 10000 times: current training loss loss Is: 1.497
 1st epoch: 11000: current training loss loss Is: 1.468
 1st epoch: 12000 times: current training loss loss Is: 1.469
 2nd epoch: 1000th time: current training loss loss Is: 1.392
 2nd epoch: 2000: current training loss loss Is: 1.382
 2nd epoch: 3000: current training loss loss Is: 1.364
 2nd epoch: 4000: current training loss loss Is: 1.361
 2nd epoch: Session 5000: current training loss loss Is: 1.344
 2nd epoch: 6000: current training loss loss Is: 1.349
 2nd epoch: 7000: current training loss loss Is: 1.313
 2nd epoch: 8000: current training loss loss Is: 1.327
 2nd epoch: 9000: current training loss loss Is: 1.306
 2nd epoch: 10000 times: current training loss loss Is: 1.302
 2nd epoch: 11000: current training loss loss Is: 1.275
 2nd epoch: 12000 times: current training loss loss Is: 1.288
 use CPU Total training time: 83.7945317 s

It took 83.79 seconds.

5. Test model results

# 5. How accurate is the test model

correct = 0
total = 0

e1 = cv2.getTickCount()  # Record the test time
with torch.no_grad():  # Do not track gradient
    for data in testloader:
        images, labels = data
        outputs = Net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        
e2 = cv2.getTickCount()
print('use CPU Total test time:{} s'.format((e2-e1)/cv2.getTickFrequency()))
print('The accuracy on the test set is:{:.3f}%'.format(correct*100/total))

Let's see what happens

# result
 use CPU Total test time: 5.3146039 s
 The accuracy on the test set is: 55.460%

The accuracy rate is about 55%. Although it is not high, it can be seen that our model has indeed learned something (because the accuracy rate of random prediction for 10 classification problems is about 10%). And the purpose of this blog is to sort out the general process of training neural networks, rather than to build an excellent model.

6. Too slow CPU? Change GPU training and try!

First, check whether the corresponding versions of cuda and cudnn are installed. The specific installation steps will not be mentioned. There are many excellent tutorials on csdn.

# Check whether the corresponding version of cuda is installed on the computer
device = torch.device('cuda:0' if torch.cuda.is_available else 'cpu')
print('Equipment name: ', device)
print('see cuda edition: ', torch.version.cuda)
# result
 Equipment name:  cuda:0
 see cuda Version: 10.1

PyTorch is very convenient to use GPU for training. Compared with CPU training in step 4, it only needs to add two lines of code:
(1) Load the neural network model into cuda
(2) Load data into cuda

The codes and notes are as follows:

# 6. Training neural network with GPU
######  The first is different   ########
Net.to(device)  # Load the neural network model into cuda
epochs = 2  # Train two epoch, batch_ Size = 4 (the size of batch_size is defined in torch.utils.data.DataLoader in the first step)
e1 = cv2.getTickCount()  # Record training time
for epoch in range(epochs):
    total_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # Get inputs
        inputs, labels = data
        ######  The second is different   ########
        inputs, labels = inputs.to(device), labels.to(device)  # Load data into cuda
        optimizer.zero_grad()
        # forward + backward + optimize
        # Forward propagation + back propagation + update parameters
        outputs = Net(inputs)  # Forward propagation to obtain outputs
        loss = criterion(outputs, labels)  # The loss function is obtained
        loss.backward()  # Backward propagation
        optimizer.step()  # Update parameters

        # Output training process
        total_loss += loss.item()
        if (i+1) % 1000 == 0:  # loss is output every 1000 times (4000 images)
            print('The first{}individual epoch: The first{:5d}Times: current training loss loss Is:{:.3f}'.format(epoch+1, i+1, total_loss/1000))
            total_loss = 0.0

e2 = cv2.getTickCount()
print('use CPU Total training time:{} s'.format((e2-e1)/cv2.getTickFrequency()))

See what happens:

# result
 1st epoch: 1000th time: current training loss loss Is: 2.231
 1st epoch: 2000: current training loss loss Is: 2.032
 1st epoch: 3000: current training loss loss Is: 1.872
 1st epoch: 4000: current training loss loss Is: 1.745
 1st epoch: Session 5000: current training loss loss Is: 1.702
 1st epoch: 6000: current training loss loss Is: 1.634
 1st epoch: 7000: current training loss loss Is: 1.603
 1st epoch: 8000: current training loss loss Is: 1.548
 1st epoch: 9000: current training loss loss Is: 1.513
 1st epoch: 10000 times: current training loss loss Is: 1.494
 1st epoch: 11000: current training loss loss Is: 1.459
 1st epoch: 12000 times: current training loss loss Is: 1.452
 2nd epoch: 1000th time: current training loss loss Is: 1.396
 2nd epoch: 2000: current training loss loss Is: 1.372
 2nd epoch: 3000: current training loss loss Is: 1.350
 2nd epoch: 4000: current training loss loss Is: 1.361
 2nd epoch: Session 5000: current training loss loss Is: 1.338
 2nd epoch: 6000: current training loss loss Is: 1.312
 2nd epoch: 7000: current training loss loss Is: 1.322
 2nd epoch: 8000: current training loss loss Is: 1.289
 2nd epoch: 9000: current training loss loss Is: 1.277
 2nd epoch: 10000 times: current training loss loss Is: 1.269
 2nd epoch: 11000: current training loss loss Is: 1.293
 2nd epoch: 12000 times: current training loss loss Is: 1.283
 use CPU Total training time: 74.0288839 s

Compared with 83 seconds of CPU, the GPU takes 74 seconds, which is faster, but the improvement is not obvious enough. This is because our network is very small and has few parameters. In addition, the GPU of my notebook is also very old, 1050 core display.

We are trying to infer how much acceleration will be achieved by using GPU:

# 6. Use GPU to infer

correct = 0
total = 0
Net.to(device)  # Load the neural network model into cuda

e1 = cv2.getTickCount()  # Record the test time
with torch.no_grad():  # Do not track gradient
    for data in testloader:
        images, labels = data
        images, labels = images.to(device), labels.to(device) # Load data into cuda
        outputs = Net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        
e2 = cv2.getTickCount()
print('use CPU Total test time:{} s'.format((e2-e1)/cv2.getTickFrequency()))
print('The accuracy on the test set is:{:.3f}%'.format(correct*100/total))

See what happens

# result
 use CPU Total test time: 4.3125164 s
 The accuracy on the test set is 54.090%

Compared with 5.3 seconds for CPU inference and 4.3 seconds for GPU inference, the speed is also improved.

Keywords: Python neural networks Pytorch CNN

Added by Rianna on Sun, 05 Dec 2021 00:33:36 +0200