60 minute entry depth learning tool (code comments plus learning pytorch)

preface

Original translation from: Deep Learning with PyTorch: A 60 Minute Blitz

Translation: Lin Yun

catalogue

60 minute introduction to PyTorch (I) - Tensors

60 minute introduction to PyTorch (II) -- Autograd automatic derivation

60 minute introduction to pytoch (III) -- neural network

60 minute introduction to PyTorch (IV) -- training a classifier

%matplotlib inline

Train a classifier

You have learned how to define a neural network, calculate the loss value and update the weight of the network.

You may be thinking now: where does the data come from?

jupyter nbconvert --to md notebook.ipynb

About data

Usually, when you process image, text, audio and video data, you can use the standard Python package to load the data into a numpy array Then convert this array to torch* Tensor.

For images, there are very practical packages such as pilot and OpenCV
For audio, there are packages such as scipy and librosa
For text, you can load it with raw Python and python, or use NLTK and SpaCy
For vision, we created a torchvision package, which contains data loading of common data sets, such as Imagenet,CIFAR10,MNIST, etc., and image converter, that is, torchvision Datasets and torch utils. data. DataLoader.

This provides great convenience and avoids code duplication.

In this tutorial, we use CIFAR10 dataset, which has the following 10 categories: 'airplane', 'automobile', 'bird', 'cat', 'der', 'dog', 'frog', 'horse', 'ship', 'truck'. The image size in this dataset is 3 * 32 * 32, that is, 3 channels, 32 * 32 pixels.

Training an image classifier

We will proceed in the following order:

Use torchvision to load and normalize CIFAR10 training set and test set
A convolutional neural network is defined
Define loss function
Training network on training set
Test network on test set

1. Loading and normalizing CIFAR10

Loading CIFAR10 using torchvision is very easy.

import torch

import torchvision

import torchvision.transforms as transforms

The output of torchvision is the PILImage of [0,1], which we convert into a tensor with a normalized range of [- 1,1].

be careful

If a BrokenPipeError occurs while running on Windows, try to set torch utils. data. Num of dataloader()_ Worker is set to 0.

transform = transforms.Compose(

    [transforms.ToTensor(),

     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])



trainset = torchvision.datasets.CIFAR10(root='./data', train=True,

                                        download=True, transform=transform)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,

                                          shuffle=True, num_workers=2)



testset = torchvision.datasets.CIFAR10(root='./data', train=False,

                                       download=True, transform=transform)

testloader = torch.utils.data.DataLoader(testset, batch_size=4,

                                         shuffle=False, num_workers=2)



classes = ('plane', 'car', 'bird', 'cat',

           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

#This process is a little slow. About 340mb of image data will be downloaded.

Files already downloaded and verified
Files already downloaded and verified

We show some interesting training images.

import matplotlib.pyplot as plt

import numpy as np

import cv2



# functions to show an image





def imshow(img):

    img = img / 2 + 0.5     # unnormalize

    npimg = img.numpy()

    

    suofanghou=cv2.resize(np.transpose(npimg, (1, 2, 0)),(640,440),interpolation=cv2.INTER_CUBIC)

    cv2.imshow("img",suofanghou)

    cv2.waitKey(0)

    # plt.imshow(np.transpose(npimg, (1, 2, 0)))

    # plt.show()





# get some random training images

dataiter = iter(trainloader)

images, labels = dataiter.next()

# print(images)

# print(labels)



# show images

# images = torchvision.utils.make_grid(images)

# array1=images.numpy()#Convert tensor data to numpy data

# maxValue=array1.max()

# array1=array1*255/maxValue#normalize to expand the image data to [0255]

# mat=np.uint8(array1)#float32-->uint8

# print('mat_shape:',mat.shape)#mat_shape: (3, 982, 814)

# mat=mat.transpose(1,2,0)#mat_shape: (982, 814，3)

# cv2.imshow("img",mat)

# cv2.waitKey(0)

imshow(torchvision.utils.make_grid(images))

# print labels

print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

 bird   car  bird   dog

2. Define a convolutional neural network

Copy the neural network code from the previous neural network section and modify it to accept the 3-channel image instead of the previous single channel image.

import torch.nn as nn

import torch.nn.functional as F



# Function of convolution layer: feature extraction of input pictures.

# The input image is convoluted with the kernel to obtain the output.

# Note: the number of cores should be consistent with the number of input channel s. A kernel is convoluted with only one channel.

# The role of the pool layer is to reduce the size of the input picture.

# In convolutional neural networks, each convolution layer is always followed by a pool layer.

# The function of adding pooling layer is to speed up the operation and make some detected features more robust.

# The full connection layer appears at the end of the convolutional neural network.

# Before the fully connected layer, the feature map generated by many previous layers is compressed into a vector. This vector is fed into the full connection layer.

# The output of the full connection layer is a one-dimensional eigenvector.

class Net(nn.Module):

    def __init__(self):

        super(Net, self).__init__()

        # [ channels, output, height_2, width_2 ]

        # Channels, the number of channels, is consistent with the above, that is, the depth of the current layer

        # Depth of output

        # height_2. filter height

        # width_2. Width of filter

        # Convolution layer 1: two-dimensional convolution layer, 3 input channels and 6 output channels, and the size of convolution kernel is 5x5

        self.conv1 = nn.Conv2d(3, 6, 3)

        # pool of square window of size=2, stride=2

        # Maximum pool layer

        self.pool = nn.MaxPool2d(2, 2)

        # Convolution layer 2: two-dimensional convolution layer, 6 input channels and 16 output channels, with convolution kernel size of 5x5

        self.conv2 = nn.Conv2d(6, 16, 3)

        # self.pool2 = nn.MaxPool2d(2, 2)

        # self.conv3 = nn.Conv2d(16, 36, 3)

        # For the full connection layer, it should be noted that the input and output of the full connection layer are two-dimensional tensors, and the general shape is [batch_size, size]

        # in_features refers to the size of the input two-dimensional tensor, that is, the size in the input [batch_size, size].

        # out_features refers to the size of the output two-dimensional tensor, that is, the shape of the output two-dimensional tensor is [batch_size, output_size]

        # in_features are determined by the shape of the input tensor, out_features determine the shape of the output tensor 

        # nn.Linear(in_features = 16 * 5 * 5, out_features = 120)

        # /*

        # The number of input features for a linear layer is defined by the active dimension from the previous layer.

        # The active shape of the upper layer is [batch_size, channels= 16, height= 5, width= 5].

        # To pass this activation to NN Linear, you need to flatten this tensor to [batch_size, 16 * 5 * 5].

        # 16 By out_ The number of channels (i.e. the number of filter cores in the previous conv layer) is defined,

        # 5x5 is the size of the space defined by the conv and the pooling operation performed on the input data.

        # */

        # An affine operation: y = Wx + b 

        # # Full connection layer 1: linear layer, input dimension 16 * 5 * 5, output dimension 120

        self.fc1 = nn.Linear(16 * 6 * 6, 120)

        # Full connection layer 2: linear layer, input dimension 120, output dimension 84

        self.fc2 = nn.Linear(120, 84)

        # Full connection layer 3: linear layer, input dimension 84, output dimension 10

        self.fc3 = nn.Linear(84, 10)



    def forward(self, x):

        #  # Convolution first, then pooling

        x = self.pool(F.relu(self.conv1(x)))

        # Re convolution

        x = self.pool(F.relu(self.conv2(x)))

        # When changing the convolution and size and reporting an error, check the size of the feature map. The size is torch Size([4, 16, 6, 6])，

        # In fact, the size of your feature map is 16 * 6 * 6, and the input dimension is 16 * 6 * 6

        # Expected input batch_size (16) to match target batch_size (4).

        # print(x.shape)

        x = x.view(-1, 16 * 6 * 6)

        # Full connection 1

        x = F.relu(self.fc1(x))

        # Full connection 2

        x = F.relu(self.fc2(x))

        x = self.fc3(x)

        return x





net = Net()

print(net)

Net(
  (conv1): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

3. Define loss function and optimizer

We use the cross entropy as the loss function and the random gradient descent of the driving quantity.

import torch.optim as optim



criterion = nn.CrossEntropyLoss()

#list(net.parameters()) will give a parameter list, recording the data of all training parameters (W and b)

optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

4. Training network

This is the first interesting moment. We just need to loop on the data iterator, input the data into the network and optimize it.

for epoch in range(2):  # loop over the dataset multiple times



    running_loss = 0.0

    for i, data in enumerate(trainloader, 0):

        # get the inputs; data is a list of [inputs, labels]

        inputs, labels = data



        # 4 initialization: clear the cache, calculate the gradient, and set the optimizer

        # zero the parameter gradients

        optimizer.zero_grad()



        # forward + backward + optimize

        outputs = net(inputs)

        loss = criterion(outputs, labels)

        loss.backward()

        optimizer.step()



        # print statistics

        running_loss += loss.item()

        if i % 2000 == 1999:    # print every 2000 mini-batches

            print('[%d, %5d] loss: %.3f' %

                  (epoch + 1, i + 1, running_loss / 2000))

            running_loss = 0.0



print('Finished Training')

[1,  2000] loss: 2.218
[1,  4000] loss: 1.839
[1,  6000] loss: 1.661
[1,  8000] loss: 1.541
[1, 10000] loss: 1.501
[1, 12000] loss: 1.453
[2,  2000] loss: 1.368
[2,  4000] loss: 1.340
[2,  6000] loss: 1.305
[2,  8000] loss: 1.263
[2, 10000] loss: 1.263
[2, 12000] loss: 1.228
Finished Training

Save our training model

PATH = './cifar_net.pth'

torch.save(net.state_dict(), PATH)

click here View a detailed description of saving a model

5. Test the network on the test set

We trained the network twice on the whole training set, but we also need to check whether the network learned from the data set.

We predict the category label output by the neural network and detect it according to the actual situation. If the prediction is correct, we add the sample to the correct prediction list.

The first step is to display the pictures in the test set and familiarize yourself with the picture content.

# print(testloader)

dataiter = iter(testloader)

images, labels = dataiter.next()



# print images

imshow(torchvision.utils.make_grid(images))

print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

GroundTruth:    cat  ship  ship plane

Next, let's reload the model we saved (Note: saving and reloading the model is not necessary here, we just want to explain how to do this):

net = Net()

net.load_state_dict(torch.load(PATH))

<All keys matched successfully>

Now let's look at what the neural network thinks the above picture is?

outputs = net(images)

The output is the probability of 10 Tags. The greater the probability of a category, the more the neural network considers it to be this category. So let's get the label with the highest probability.

_, predicted = torch.max(outputs, 1)



print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]

                              for j in range(4)))

Predicted:   bird  ship  ship plane

The result looks very good.

Next, let's look at the results of the network on the whole test set.

correct = 0

total = 0

with torch.no_grad():

    for data in testloader:

        images, labels = data

        outputs = net(images)

        _, predicted = torch.max(outputs.data, 1)

        total += labels.size(0)

        correct += (predicted == labels).sum().item()



print('Accuracy of the network on the 10000 test images: %d %%' % (

    100 * correct / total))

Accuracy of the network on the 10000 test images: 56 %

The result looks better than chance. The accuracy rate of chance is 10%. It seems that the network has learned something.

What kind of prediction is good and what kind of prediction result is bad?

class_correct = list(0. for i in range(10))

class_total = list(0. for i in range(10))

with torch.no_grad():

    for data in testloader:

        images, labels = data

        outputs = net(images)

        _, predicted = torch.max(outputs, 1)

        c = (predicted == labels).squeeze()

        for i in range(4):

            label = labels[i]

            class_correct[label] += c[i].item()

            class_total[label] += 1





for i in range(10):

    print('Accuracy of %5s : %2d %%' % (

        classes[i], 100 * class_correct[i] / class_total[i]))

Accuracy of plane : 77 %
Accuracy of   car : 63 %
Accuracy of  bird : 42 %
Accuracy of   cat : 40 %
Accuracy of  deer : 50 %
Accuracy of   dog : 51 %
Accuracy of  frog : 64 %
Accuracy of horse : 58 %
Accuracy of  ship : 75 %
Accuracy of truck : 59 %

What's next?

How do we run neural networks on GPU?

Train on GPU

How do you convert a Tensor to a GPU? How do you move a neural network to a GPU for training. This operation recursively traverses the module and converts its parameters and buffers into CUDA tensors.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Assume that we are on a CUDA machine, then this should print a CUDA device:

#Suppose we have a CUDA machine, this operation will display the CUDA device.

print(device)

cpu

Next, suppose we have a CUDA machine, and then these methods will recursively traverse all modules and convert their parameters and buffers into CUDA tensors:

net.to(device)

Net(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

Remember, you must also convert your input and target values to the GPU at each step:

inputs, labels = inputs.to(device), labels.to(device)

Why didn't we notice that the speed of GPU increased a lot? That's because the network is very small.

Practice:

Try to increase the width of your network (the second parameter of the first nn.Conv2d, the first parameter of the second nn.Conv2d, they need to be the same number) and see what kind of acceleration you get.

Objectives achieved:

We have a deep understanding of PyTorch's tensor library and neural network
Trained a small network to classify pictures

Training on multiple GPU s

If you want to use all GPU s to speed up more, please check the optional reading: Data parallel

What's next?

Training neural networks to play video games
Train the best ResNet on ImageNet
Training a human face generator using a confrontation generation network
Training a character level language model using LSTM network
More examples
More tutorials
Discuss PyTorch in the Forum
Chat with other users on Slack

Keywords: Python Pytorch Deep Learning

Added by harley1387 on Thu, 20 Jan 2022 16:50:13 +0200

Programming VIP

60 minute entry depth learning tool (code comments plus learning pytorch)

preface

catalogue

Train a classifier

About data

Training an image classifier

1. Loading and normalizing CIFAR10

be careful

2. Define a convolutional neural network

3. Define loss function and optimizer

4. Training network

5. Test the network on the test set

Train on GPU

Practice:

Objectives achieved:

Training on multiple GPU s

What's next?

Popular Keywords