How to understand the residual network (resnet) structure and code implementation (pytoch) note sharing

In the network of deep learning, I think the most basic is the residual network. What I share today is not the theoretical part of the residual network. Just remember that the idea of the residual network runs through many network structures behind. If you understand the residual network structure, then some advanced network structures behind are also easy to understand.

Overall structure of residual network

1, Residual block structure

The residual block structure corresponding to the first 50 layers (excluding the 50th layer) code is as follows:

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channel, out_channel, stride=1, downsample=None, **kwargs):#downsample=None indicates the residual structure of the dotted line
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out

The residual block structure (including the 50th layer) code corresponding to the last 50 layers is as follows:

class Bottleneck(nn.Module):
    """
    Note: in the original paper, on the main branch of the dotted line residual structure, the first 1 x1 The step of convolution layer is 2 and the second is 3 x3 The convolution layer step is 1.
    But in pytorch The official implementation process is the first 1 x1 The step of convolution layer is 1 and the second is 3 x3 The convolution layer step is 2,
    The advantage of doing so is to be able to top1 Up about 0.5%Accuracy.
    Can refer to Resnet v1.5 https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch
    """
    expansion = 4

    def __init__(self, in_channel, out_channel, stride=1, downsample=None,
                 groups=1, width_per_group=64):
        super(Bottleneck, self).__init__()

        width = int(out_channel * (width_per_group / 64.)) * groups

        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=width,
                               kernel_size=1, stride=1, bias=False)  # squeeze channels
        self.bn1 = nn.BatchNorm2d(width)
        # -----------------------------------------
        self.conv2 = nn.Conv2d(in_channels=width, out_channels=width, groups=groups,
                               kernel_size=3, stride=stride, bias=False, padding=1)
        self.bn2 = nn.BatchNorm2d(width)
        # -----------------------------------------
        self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channel*self.expansion,
                               kernel_size=1, stride=1, bias=False)  # unsqueeze channels
        self.bn3 = nn.BatchNorm2d(out_channel*self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out

Why do beginners feel puzzled when they see two pieces of code? ① In fact, these two residual blocks are for different network layers. The first residual structure is for shallow residual networks, such as resnet18 and resnet34, while the second residual structure is for deep residual structures, such as resnet50,resnet101 and resnet152.

② These two residual blocks will be implemented in the code to facilitate the change of the number of layers of the network. For the residual Block structure, the general network is always named Block. So look at the code, look at the diagram.

Secondly, it should be noted that 3x3 convolution kernel is generally used to reduce the size of feature map, and 1x1 convolution is generally used to reduce or increase the number of channels.

II. Difference between concat and add

For beginners, they are still confused or unable to understand these two words. So pay attention to this,

concat operation: it is generally necessary to have the same size of the feature map before splicing on the corresponding channel dimension, for example, as shown in the following figure:

add operation: generally, the size of the feature map and the number of channels need to be the same. For example, the two graphs on the left of the figure below have the feature size of 2x2 and the number of channels is 1, so they can be added at the corresponding position.

III. why does the residual edge need to be down sampled

See the figure below. You will find that one of the residual edges of the above two residual blocks does not have the 1x1128 style shown in the figure below. I can only tell you that this is the author's default that you have started in-depth learning, so you didn't write it. Let's carefully analyze the figure below. First [56,56,64] passes through 3x3128, and the convolution kernel with step length of 2 will become [28,28128], and then passes through 3x3128, and the convolution kernel with step length of 1, It will become [28,28128], but it is inconsistent with the size and number of channels of [56,56,64], so [56,56,64] performs a convolution kernel of 3x3128 with step size of 2 on the residual edge, so as to obtain [28,28128], and the last two [28,28128] are added.

The code is as follows:

import torch.nn as nn
import torch
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channel, out_channel, stride=1, downsample=None, **kwargs):#downsample=None indicates the residual structure of the dotted line
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = nn.Conv2d(in_channels=in_channel,out_channels=out_channel,kernel_size=1,stride=2)

        self.bn3 = nn.BatchNorm2d(out_channel)

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity =self.relu(self.bn3(self.downsample(x)))

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out

if __name__ == '__main__':
    a=torch.randn((1,64,56,56))
    model=BasicBlock(in_channel=64,out_channel=128,stride=2,downsample=True)
    out=model(a)
    print(out.shape)

The complete resnet network code is as follows:

import torch.nn as nn
import torch

#The following class is the residual structure of 3x3
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channel, out_channel, stride=1, downsample=None, **kwargs):#downsample=None indicates the residual structure of the dotted line
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out

#This represents the residual structure of the next 50 layers
class Bottleneck(nn.Module):
    """
    Note: in the original paper, on the main branch of the dotted line residual structure, the first 1 x1 The step of convolution layer is 2 and the second is 3 x3 The convolution layer step is 1.
    But in pytorch The official implementation process is the first 1 x1 The step of convolution layer is 1 and the second is 3 x3 The convolution layer step is 2,
    The advantage of doing so is to be able to top1 Up about 0.5%Accuracy.
    Can refer to Resnet v1.5 https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch
    """
    expansion = 4

    def __init__(self, in_channel, out_channel, stride=1, downsample=None,
                 groups=1, width_per_group=64):
        super(Bottleneck, self).__init__()

        width = int(out_channel * (width_per_group / 64.)) * groups

        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=width,
                               kernel_size=1, stride=1, bias=False)  # squeeze channels
        self.bn1 = nn.BatchNorm2d(width)
        # -----------------------------------------
        self.conv2 = nn.Conv2d(in_channels=width, out_channels=width, groups=groups,
                               kernel_size=3, stride=stride, bias=False, padding=1)
        self.bn2 = nn.BatchNorm2d(width)
        # -----------------------------------------
        self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channel*self.expansion,
                               kernel_size=1, stride=1, bias=False)  # unsqueeze channels
        self.bn3 = nn.BatchNorm2d(out_channel*self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out


class ResNet(nn.Module):

    def __init__(self,
                 block,
                 blocks_num,  #For 34 layers 3,4,6,3
                 num_classes=1000,
                 include_top=True,#In order to build more complex networks
                 groups=1,
                 width_per_group=64):
        super(ResNet, self).__init__()
        self.include_top = include_top
        self.in_channel = 64

        self.groups = groups
        self.width_per_group = width_per_group

        self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,
                               padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, blocks_num[0])
        self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # output size = (1, 1)
            self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def _make_layer(self, block, channel, block_num, stride=1):
        downsample = None
        if stride != 1 or self.in_channel != channel * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channel * block.expansion))

        layers = []
        layers.append(block(self.in_channel,
                            channel,
                            downsample=downsample,
                            stride=stride,
                            groups=self.groups,
                            width_per_group=self.width_per_group))
        self.in_channel = channel * block.expansion

        for _ in range(1, block_num):
            layers.append(block(self.in_channel,
                                channel,
                                groups=self.groups,
                                width_per_group=self.width_per_group))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x = self.avgpool(x)
            x = torch.flatten(x, 1)
            x = self.fc(x)

        return x


def resnet34(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet34-333f7ec4.pth
    return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)


def resnet50(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet50-19c8e357.pth
    return ResNet(Bottleneck, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)


def resnet101(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet101-5d3b4d8f.pth
    return ResNet(Bottleneck, [3, 4, 23, 3], num_classes=num_classes, include_top=include_top)


def resnext50_32x4d(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth
    groups = 32
    width_per_group = 4
    return ResNet(Bottleneck, [3, 4, 6, 3],
                  num_classes=num_classes,
                  include_top=include_top,
                  groups=groups,
                  width_per_group=width_per_group)


def resnext101_32x8d(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth
    groups = 32
    width_per_group = 8
    return ResNet(Bottleneck, [3, 4, 23, 3],
                  num_classes=num_classes,
                  include_top=include_top,
                  groups=groups,
                  width_per_group=width_per_group)

if __name__ == '__main__':
    net=resnet34()
    print(net)

So far, the network structure description is completed! I hope you have something to gain. If you have any questions, please comment!

Keywords: AI Pytorch Deep Learning

Added by cpharry on Thu, 10 Mar 2022 02:06:36 +0200