Computer Visual PyTorch Implementation

Implementation of Computer Vision PyTorch (1)

PyTorch Base Module

Computer vision can be widely used in many real world fields. For example, basic image processing, image recognition, image segmentation, target tracking, image classification, attitude estimation, and so on. In-depth learning, many learning frameworks have been developed, such as Caffe, MXNet, Pytorch and TensorFlow. These frameworks can greatly simplify the process of building in-depth learning neural networks.
In computer vision applications, pytorch modules are used to build different neural networks to extract different types of features in different network layers to achieve different application functions.
Here, I'll start with a few modules of the pytorch base.

Import the base packages for your application here first

import torch.nn as nn
import torch

1. Linear Layer

The linear layer, also known as the fully connected layer, generally appears in the last layer of the network
The definition code is as follows:

nn.Linear(in_features,out_features,bias=True)
  • in_features: the feature dimension representing the input
  • out_features: the feature dimension representing the output
  • Bias: whether to introduce bias parameter
    This layer actually implements the simplest linear regression model, which is equivalent to making a matrix multiplication and a matrix addition y=x*W+b

2. Convolution Layer

In the deep learning model, the core part is the convolution layer. For images, convolution is a linear transformation of images. The convolution operation involves two tensors, the first is the input tensor, and the second is the weight tensor of the linear transformation, also known as the convolution kernel.

The definition code is as follows:

class _ConvNd(in_channels,out_channels,kernel_size,stride,padding,dilation,transposed,output_padding,groups,bias,padding_mode)

  • in_channels: Number of input channels, for example: A color image is made up of R, G, B channels.
  • out_channels: the number of output channels, and the number of channels for the convolution kernel
  • kernel_size:; This value represents the dimension size of the convolution kernel, which can be a tuple for two-dimensional convolution. For example (3, 4), this means that the convolution core is 3x4 in size.
  • stride: the step of a convolution operation in a convolution kernel
  • padding: Fill in the input tensor space size
  • dilation: another method of expanding convolution
  • Transposed: transposed convolution, if normal convolution is calculated for False, if transposed convolution is calculated for True

3. Normalized Layer

There are many kinds of normalization layers, including:

  • Batch Normalization
nn.BatchNorm2d(num_features,eps=1e-05,momentum=0.1,affine=True,track_running_stats=True)
  • Group Normalization
  • Instance Normalization
  • Layer Normalization
  • Local Response Normalization
    Almost all normalizations are similar to the following, where x is the value of the input tensor. γ \gamma γ and β \beta β A trainable vector parameter with the same number of elements as the number of channels for the input tensor, which differs from the normalized mean E ( x ) E(x) How E(x) is calculated and the normalized variance V a r ( x ) Var(x) Var(x) is calculated differently.

4. Pooling Layer

Maximum pooling layer: Select a convolution core area and take the maximum input tensor value in this area. Depending on the shape of the input tensor, the maximum pooling layer can be divided into one-dimensional, two-dimensional and three-dimensional.
The code is as follows:

nn.MaxPool2d(kernel_size,stride=None,padding=0,dilation=1,return_indice=False,ceil_mode=False)
  • kernel_siz: The convolution kernel here is not computed, but the maximum value is selected in the convolution kernel of the input tensor.
  • return_inices: Determines whether to return the location of the largest element.
  • ceil-mode: Whether the final output of the maximum pooled layer is rounded up.

5.dropout layer

We all know that the complexity of neural networks is related to the way neurons connect. The more neurons connect, the more complex the model. Months are prone to over-fitting. In order to reduce the over-fitting of the neural network, further generalization of the model is achieved by reducing the number of neuron connections.
Reducing the number of connections to a neural network is relatively complex. One of the easiest ways to achieve equivalence is to randomly set the elements of the activation function tensor and the weight tensor to zero.

nn.Dropout2d(p=0.5,inplace=False)

6. Module combination

Here's how to use nn.Sequential to construct a sequential module.

# Method 1. Using parameters to build sequential models
model=nn.Sequential(
      nn.Conv2d(1,20,5),
      nn.ReLU(),
      nn.Conv2d(20,64,5),
      nn.ReLU())
# Method 2. Building models using a sequence dictionary
model=nn.Sequential(OrdereDict([
      ('conv1',nn.Conv2d(1,20,5)),
      ('relu1',nn.ReLU()),
      ('conv2',nn.Conv2d(20,64,5)),
      ('relu2',nn.ReLU())]))

7. Feature Extraction

In computer vision applications. The deep learning neural network model is divided into two parts: the first part is to extract the features of the original dataset, the second part is to recombine the extracted features to produce prediction probability values.
In computer vision applications, in order to achieve the corresponding applications, such as image classification. The big boys have developed many deep learning models, that is, different neural network models are constructed through different combinations of modules such as convolution layer and pooling layer. Such as image classification model algorithm.

Here, by building an example of AlexNet, you can get a faster understanding of pytorch's methods for building neural network models and their corresponding templates.

class AlexNet(nn.Module):
    #Define Category Category 10
    def __init__(self, num_classes=10):
        super(AlexNet, self).__init__()
        #feature extraction
        self.features = nn.Sequential(
            nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2),  # input[3, 224, 224]  output[48, 55, 55]
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[48, 27, 27]
            nn.Conv2d(48, 128, kernel_size=5, padding=2),           # output[128, 27, 27]
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[128, 13, 13]
            nn.Conv2d(128, 192, kernel_size=3, padding=1),          # output[192, 13, 13]
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 192, kernel_size=3, padding=1),          # output[192, 13, 13]
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 128, kernel_size=3, padding=1),          # output[128, 13, 13]
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),                  # output[128, 6, 6]
        )
        #Feature combination classification
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            nn.Linear(128 * 6 * 6, 2048),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(2048, 2048),
            nn.ReLU(inplace=True),
            nn.Linear(2048, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)
        return x

Keywords: Computer Vision Deep Learning

Added by Rother2005 on Sun, 16 Jan 2022 13:37:44 +0200