Pytorch: Target Detection Network-FPN

Pytorch: Target Detection-Feature Pyramid-FPN

Copyright: Jingmin Wei, Pattern Recognition and Intelligent System, School of Artificial and Intelligence, Huazhong University of Science and Technology

This tutorial is not commercial and is only for learning and reference exchange.



Pytorch Object Detection in Deep Learning

import torch.nn as nn
import torch.nn.functional as F
import torch

FPN Network Structure

To enhance semantics, traditional object detection models usually only perform subsequent operations on the last feature map of the deep convolution network, which usually has a larger downsampling rate (multiple of the image reduction), such as 16 , 32 16,32 16,32. As a result, small objects have less valid information on the feature map and their detection performance decreases dramatically, this problem is also known as multiscale problem.

The key to solving multiscale problems is how to extract multiscale features. The traditional method is Image Pyramid. The main idea is to make the input pictures into multiple scales, and the images of different scales generate different scales of features. This method is simple and effective, and has been used in many contests such as COCO. However, its drawback is time-consuming and computational.

From torch. As you can see from the chapter nn, the different layers of convolution neural network have different sizes and semantic information and are similar to a pyramid structure. 2017 2017 The 2017 FPN (Feature Pyramid Network) method combines the features of different layers and improves the multi-scale detection problem.

The overall architecture of FPN is shown in the figure, which mainly includes bottom-up network, top-down network, horizontal connection and convolution fusion. 4 4 Four parts.

  • Bottom-up: The leftmost side is the normal convolution network, which uses the ResNet structure by default to extract semantic information. C 1 C_1 C1 represents the first convolution and pooling layers of ResNet, while C 2 − C 5 C_2-C_5 C2_C5 are different ResNet convolution groups, which contain multiple Bottleneck structures with the same size of feature maps and decreasing size between groups.

  • Top-down: First C 5 C_5 C5 1 × 1 1\times1 1 × 1 Convolution reduces the number of channels to get M 5 M_5 M5, then sequentially up-sampling M 4 , M 3 , M 2 M_4,M_3,M_2 M4, M3, M2, for the purpose of getting and C 4 , C 3 , C 2 C_4,C_3,C_2 C4, C3, C2 have the same characteristics in length and width to facilitate element-by-element addition in the next step. Use here 2 2 Twice nearest neighbor up sampling, i.e. direct replication of near elements, rather than linear interpolation.

  • Lateral Connection: The purpose is to fuse the high-semantic features sampled above with the shallow positioning detail features. When the high-semantic features are sampled up, they have the same length and width as the corresponding shallow features, while the number of channels is fixed to 256 256 256, so the underlying features need to be addressed C 2 − C 4 C_2-C_4 C2 C4 Conducted 1 × 1 1\times1 1 × 1 Convolution causes the number of channels to become 256 256 256, then add them element by element to get M 4 , M 3 , M 2 M_4,M_3,M_2 M4, M3, M2. Because C 1 C_1 C1's signature graph is large and has insufficient semantic information, so it is not C 1 C_1 C1 is placed in the horizontal connection.

  • Convolution Fusion: After deriving the added features, use 3 × 3 3\times3 3 × 3 Convolution pair generated M 2 − M 4 M_2-M_4 M2_M4 is fused again, M 5 M_5 M5 Do not process. The goal is to eliminate the overlap effect of the up-sampling process in order to generate the final feature map. P 2 − P 5 P_2-P_5 P2​−P5​ .

For the actual object detection algorithm, RoI(Region of Interests), which is the area of interest, needs to be extracted on the feature map. And FPN has 4 4 It is also a problem to choose which of the four output feature maps has the above feature. The solution given by FPN is to extract large-scale RoI from deep feature maps using different feature maps for different sizes of RoI, such as P 5 P_5 P5, small-scale RoI is extracted from shallow feature maps, such as P 2 P_2 P2, to determine the method, you can view the FPN paper by yourself.

FPN transmits the deep semantic information to the bottom to supplement the shallow semantic information, which obtains the high resolution and strong semantic features, and has a very good performance in small object detection, instance segmentation and other fields.

code implementation

First implement Residual Block:

# Define the Bottleneck class for ResNet
class Bottleneck(nn.Module):
    expansion = 4 # Define a class property, not an instance property
    def __init__(self, in_channels, channels, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        # A network stack layer consists of three convolutions + BN s
        self.bottleneck = nn.Sequential(
            nn.Conv2d(in_channels, channels, 1, stride=1, bias=False),
            nn.Conv2d(channels, channels, 3, stride=stride, padding=1, bias=False),
            nn.Conv2d(channels, channels*self.expansion, 1, stride=1, bias=False),
            nn.BatchNorm2d(channels * self.expansion)
        self.relu = nn.ReLU(inplace=True)

        # Down sample consists of a 1*1 convolution containing BN
        self.downsample = downsample

    def forward(self, x):
        identity = x 
        output = self.bottleneck(x)
        if self.downsample is not None:
            identity = self.downsample(x)
        # Add identity (identity mapping) to stack layer output
        output += identity
        output = self.relu(output)
        return output

Secondly, the residual blocks are used to build the FPN, which is unfamiliar. ResNet Tutorial.

# Define FPN class, initialization requires a list representing the number of Bottleneck s per ResNet stage
class FPN(nn.Module):
    def __init__(self, layers):
        super(FPN, self).__init__()
        self.in_channels = 64
        # C1 module handling input
        self.conv1 = nn.Conv2d(3, 64, 7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(3, stride=2, padding=1)
        # Build bottom-up C2,C3,C4,C5
        self.layer1 = self._make_layer(64, layers[0]) # stride=1
        self.layer2 = self._make_layer(128, layers[1], 2) # stride=2
        self.layer3 = self._make_layer(256, layers[2], 2) # stride=2
        self.layer4 = self._make_layer(512, layers[3], 2) # stride=2
        # Reduce the number of channels for C5 to get M5
        self.toplayer = nn.Conv2d(2048, 256, 1, stride=1, padding=0)
        # 3*3 Convolution Fusion Features
        self.smooth1 = nn.Conv2d(256, 256, 3, 1, 1)
        self.smooth2 = nn.Conv2d(256, 256, 3, 1, 1)
        self.smooth3 = nn.Conv2d(256, 256, 3, 1, 1)
        # Horizontal connection to ensure the same number of channels
        self.latlayer1 = nn.Conv2d(1024, 256, 1, 1, 0)
        self.latlayer2 = nn.Conv2d(512, 256, 1, 1, 0)
        self.latlayer3 = nn.Conv2d(256, 256, 1, 1, 0)

    # Define a protected method to build C2-C5
    # Think like ResNet, pay attention to distinguishing stride=1/2
    def _make_layer(self, channels, blocks, stride=1):
        downsample = None
        # Residual Block has an identity mapping when stride is 2
        if stride != 1 or self.in_channels != Bottleneck.expansion * channels:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, Bottleneck.expansion*channels, 1, stride, bias=False),
        layers = []
        layers.append(Bottleneck(self.in_channels, channels, stride, downsample))
        self.in_channels = channels*Bottleneck.expansion
        for i in range(1, blocks):
            layers.append(Bottleneck(self.in_channels, channels))
        return nn.Sequential(*layers)

    # Top-down up sampling module
    def _upsample_add(self, x, y):
        _, _, H, W = y.shape
        return F.interpolate(x, size=(H, W)) + y

    def forward(self, x):
        # Bottom-up
        c1 = self.maxpool(self.relu(self.bn1(self.conv1(x))))
        c2 = self.layer1(c1)
        c3 = self.layer2(c2)
        c4 = self.layer3(c3)
        c5 = self.layer4(c4)
        # Top-down
        m5 = self.toplayer(c5)
        m4 = self._upsample_add(m5, self.latlayer1(c4))
        m3 = self._upsample_add(m4, self.latlayer2(c3))
        m2 = self._upsample_add(m3, self.latlayer3(c2)) 
        # Convolution Fusion, Smoothing
        p5 = m5
        p4 = self.smooth1(m4)
        p3 = self.smooth2(m3)
        p2 = self.smooth3(m2)
        return p2, p3, p4, p5
def FPN50():
    return FPN([3, 4, 6, 3]) # FPN50

def FPN101():
    return FPN([3, 4, 23, 3]) # FPN101

def FPN152():
    return FPN([3, 8, 36, 3]) # FPN152
# Define a FPN network
net_fpn = FPN50() # FPN50
input = torch.randn(1, 3, 224, 224)
output = net_fpn(input)
# Look at the sizes of the signature maps, which have the same number of channels and are decreasing in size
print(output[0].shape) # p2
print(output[1].shape) # p3
print(output[2].shape) # p4
print(output[3].shape) # p5
torch.Size([1, 256, 56, 56])
torch.Size([1, 256, 28, 28])
torch.Size([1, 256, 14, 14])
torch.Size([1, 256, 7, 7])
from torchsummary import summary
# D*W*H 
summary(net_fpn, input_size=(3, 224, 224), device='cpu') 
        Layer (type)               Output Shape         Param #
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]           4,096
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
           Conv2d-11          [-1, 256, 56, 56]          16,384
      BatchNorm2d-12          [-1, 256, 56, 56]             512
           Conv2d-13          [-1, 256, 56, 56]          16,384
      BatchNorm2d-14          [-1, 256, 56, 56]             512
             ReLU-15          [-1, 256, 56, 56]               0
       Bottleneck-16          [-1, 256, 56, 56]               0
           Conv2d-17           [-1, 64, 56, 56]          16,384
      BatchNorm2d-18           [-1, 64, 56, 56]             128
             ReLU-19           [-1, 64, 56, 56]               0
           Conv2d-20           [-1, 64, 56, 56]          36,864
      BatchNorm2d-21           [-1, 64, 56, 56]             128
             ReLU-22           [-1, 64, 56, 56]               0
           Conv2d-23          [-1, 256, 56, 56]          16,384
      BatchNorm2d-24          [-1, 256, 56, 56]             512
             ReLU-25          [-1, 256, 56, 56]               0
       Bottleneck-26          [-1, 256, 56, 56]               0
           Conv2d-27           [-1, 64, 56, 56]          16,384
      BatchNorm2d-28           [-1, 64, 56, 56]             128
             ReLU-29           [-1, 64, 56, 56]               0
           Conv2d-30           [-1, 64, 56, 56]          36,864
      BatchNorm2d-31           [-1, 64, 56, 56]             128
             ReLU-32           [-1, 64, 56, 56]               0
           Conv2d-33          [-1, 256, 56, 56]          16,384
      BatchNorm2d-34          [-1, 256, 56, 56]             512
             ReLU-35          [-1, 256, 56, 56]               0
       Bottleneck-36          [-1, 256, 56, 56]               0
           Conv2d-37          [-1, 128, 56, 56]          32,768
      BatchNorm2d-38          [-1, 128, 56, 56]             256
             ReLU-39          [-1, 128, 56, 56]               0
           Conv2d-40          [-1, 128, 28, 28]         147,456
      BatchNorm2d-41          [-1, 128, 28, 28]             256
             ReLU-42          [-1, 128, 28, 28]               0
           Conv2d-43          [-1, 512, 28, 28]          65,536
      BatchNorm2d-44          [-1, 512, 28, 28]           1,024
           Conv2d-45          [-1, 512, 28, 28]         131,072
      BatchNorm2d-46          [-1, 512, 28, 28]           1,024
             ReLU-47          [-1, 512, 28, 28]               0
       Bottleneck-48          [-1, 512, 28, 28]               0
           Conv2d-49          [-1, 128, 28, 28]          65,536
      BatchNorm2d-50          [-1, 128, 28, 28]             256
             ReLU-51          [-1, 128, 28, 28]               0
           Conv2d-52          [-1, 128, 28, 28]         147,456
      BatchNorm2d-53          [-1, 128, 28, 28]             256
             ReLU-54          [-1, 128, 28, 28]               0
           Conv2d-55          [-1, 512, 28, 28]          65,536
      BatchNorm2d-56          [-1, 512, 28, 28]           1,024
             ReLU-57          [-1, 512, 28, 28]               0
       Bottleneck-58          [-1, 512, 28, 28]               0
           Conv2d-59          [-1, 128, 28, 28]          65,536
      BatchNorm2d-60          [-1, 128, 28, 28]             256
             ReLU-61          [-1, 128, 28, 28]               0
           Conv2d-62          [-1, 128, 28, 28]         147,456
      BatchNorm2d-63          [-1, 128, 28, 28]             256
             ReLU-64          [-1, 128, 28, 28]               0
           Conv2d-65          [-1, 512, 28, 28]          65,536
      BatchNorm2d-66          [-1, 512, 28, 28]           1,024
             ReLU-67          [-1, 512, 28, 28]               0
       Bottleneck-68          [-1, 512, 28, 28]               0
           Conv2d-69          [-1, 128, 28, 28]          65,536
      BatchNorm2d-70          [-1, 128, 28, 28]             256
             ReLU-71          [-1, 128, 28, 28]               0
           Conv2d-72          [-1, 128, 28, 28]         147,456
      BatchNorm2d-73          [-1, 128, 28, 28]             256
             ReLU-74          [-1, 128, 28, 28]               0
           Conv2d-75          [-1, 512, 28, 28]          65,536
      BatchNorm2d-76          [-1, 512, 28, 28]           1,024
             ReLU-77          [-1, 512, 28, 28]               0
       Bottleneck-78          [-1, 512, 28, 28]               0
           Conv2d-79          [-1, 256, 28, 28]         131,072
      BatchNorm2d-80          [-1, 256, 28, 28]             512
             ReLU-81          [-1, 256, 28, 28]               0
           Conv2d-82          [-1, 256, 14, 14]         589,824
      BatchNorm2d-83          [-1, 256, 14, 14]             512
             ReLU-84          [-1, 256, 14, 14]               0
           Conv2d-85         [-1, 1024, 14, 14]         262,144
      BatchNorm2d-86         [-1, 1024, 14, 14]           2,048
           Conv2d-87         [-1, 1024, 14, 14]         524,288
      BatchNorm2d-88         [-1, 1024, 14, 14]           2,048
             ReLU-89         [-1, 1024, 14, 14]               0
       Bottleneck-90         [-1, 1024, 14, 14]               0
           Conv2d-91          [-1, 256, 14, 14]         262,144
      BatchNorm2d-92          [-1, 256, 14, 14]             512
             ReLU-93          [-1, 256, 14, 14]               0
           Conv2d-94          [-1, 256, 14, 14]         589,824
      BatchNorm2d-95          [-1, 256, 14, 14]             512
             ReLU-96          [-1, 256, 14, 14]               0
           Conv2d-97         [-1, 1024, 14, 14]         262,144
      BatchNorm2d-98         [-1, 1024, 14, 14]           2,048
             ReLU-99         [-1, 1024, 14, 14]               0
      Bottleneck-100         [-1, 1024, 14, 14]               0
          Conv2d-101          [-1, 256, 14, 14]         262,144
     BatchNorm2d-102          [-1, 256, 14, 14]             512
            ReLU-103          [-1, 256, 14, 14]               0
          Conv2d-104          [-1, 256, 14, 14]         589,824
     BatchNorm2d-105          [-1, 256, 14, 14]             512
            ReLU-106          [-1, 256, 14, 14]               0
          Conv2d-107         [-1, 1024, 14, 14]         262,144
     BatchNorm2d-108         [-1, 1024, 14, 14]           2,048
            ReLU-109         [-1, 1024, 14, 14]               0
      Bottleneck-110         [-1, 1024, 14, 14]               0
          Conv2d-111          [-1, 256, 14, 14]         262,144
     BatchNorm2d-112          [-1, 256, 14, 14]             512
            ReLU-113          [-1, 256, 14, 14]               0
          Conv2d-114          [-1, 256, 14, 14]         589,824
     BatchNorm2d-115          [-1, 256, 14, 14]             512
            ReLU-116          [-1, 256, 14, 14]               0
          Conv2d-117         [-1, 1024, 14, 14]         262,144
     BatchNorm2d-118         [-1, 1024, 14, 14]           2,048
            ReLU-119         [-1, 1024, 14, 14]               0
      Bottleneck-120         [-1, 1024, 14, 14]               0
          Conv2d-121          [-1, 256, 14, 14]         262,144
     BatchNorm2d-122          [-1, 256, 14, 14]             512
            ReLU-123          [-1, 256, 14, 14]               0
          Conv2d-124          [-1, 256, 14, 14]         589,824
     BatchNorm2d-125          [-1, 256, 14, 14]             512
            ReLU-126          [-1, 256, 14, 14]               0
          Conv2d-127         [-1, 1024, 14, 14]         262,144
     BatchNorm2d-128         [-1, 1024, 14, 14]           2,048
            ReLU-129         [-1, 1024, 14, 14]               0
      Bottleneck-130         [-1, 1024, 14, 14]               0
          Conv2d-131          [-1, 256, 14, 14]         262,144
     BatchNorm2d-132          [-1, 256, 14, 14]             512
            ReLU-133          [-1, 256, 14, 14]               0
          Conv2d-134          [-1, 256, 14, 14]         589,824
     BatchNorm2d-135          [-1, 256, 14, 14]             512
            ReLU-136          [-1, 256, 14, 14]               0
          Conv2d-137         [-1, 1024, 14, 14]         262,144
     BatchNorm2d-138         [-1, 1024, 14, 14]           2,048
            ReLU-139         [-1, 1024, 14, 14]               0
      Bottleneck-140         [-1, 1024, 14, 14]               0
          Conv2d-141          [-1, 512, 14, 14]         524,288
     BatchNorm2d-142          [-1, 512, 14, 14]           1,024
            ReLU-143          [-1, 512, 14, 14]               0
          Conv2d-144            [-1, 512, 7, 7]       2,359,296
     BatchNorm2d-145            [-1, 512, 7, 7]           1,024
            ReLU-146            [-1, 512, 7, 7]               0
          Conv2d-147           [-1, 2048, 7, 7]       1,048,576
     BatchNorm2d-148           [-1, 2048, 7, 7]           4,096
          Conv2d-149           [-1, 2048, 7, 7]       2,097,152
     BatchNorm2d-150           [-1, 2048, 7, 7]           4,096
            ReLU-151           [-1, 2048, 7, 7]               0
      Bottleneck-152           [-1, 2048, 7, 7]               0
          Conv2d-153            [-1, 512, 7, 7]       1,048,576
     BatchNorm2d-154            [-1, 512, 7, 7]           1,024
            ReLU-155            [-1, 512, 7, 7]               0
          Conv2d-156            [-1, 512, 7, 7]       2,359,296
     BatchNorm2d-157            [-1, 512, 7, 7]           1,024
            ReLU-158            [-1, 512, 7, 7]               0
          Conv2d-159           [-1, 2048, 7, 7]       1,048,576
     BatchNorm2d-160           [-1, 2048, 7, 7]           4,096
            ReLU-161           [-1, 2048, 7, 7]               0
      Bottleneck-162           [-1, 2048, 7, 7]               0
          Conv2d-163            [-1, 512, 7, 7]       1,048,576
     BatchNorm2d-164            [-1, 512, 7, 7]           1,024
            ReLU-165            [-1, 512, 7, 7]               0
          Conv2d-166            [-1, 512, 7, 7]       2,359,296
     BatchNorm2d-167            [-1, 512, 7, 7]           1,024
            ReLU-168            [-1, 512, 7, 7]               0
          Conv2d-169           [-1, 2048, 7, 7]       1,048,576
     BatchNorm2d-170           [-1, 2048, 7, 7]           4,096
            ReLU-171           [-1, 2048, 7, 7]               0
      Bottleneck-172           [-1, 2048, 7, 7]               0
          Conv2d-173            [-1, 256, 7, 7]         524,544
          Conv2d-174          [-1, 256, 14, 14]         262,400
          Conv2d-175          [-1, 256, 28, 28]         131,328
          Conv2d-176          [-1, 256, 56, 56]          65,792
          Conv2d-177          [-1, 256, 14, 14]         590,080
          Conv2d-178          [-1, 256, 28, 28]         590,080
          Conv2d-179          [-1, 256, 56, 56]         590,080
Total params: 26,262,336
Trainable params: 26,262,336
Non-trainable params: 0
Input size (MB): 0.57
Forward/backward pass size (MB): 302.71
Params size (MB): 100.18
Estimated Total Size (MB): 403.47
# View the first layer of the FPN, C2
  (0): Bottleneck(
    (bottleneck): Sequential(
      (0): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (downsample): Sequential(
      (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): Bottleneck(
    (bottleneck): Sequential(
      (0): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
  (2): Bottleneck(
    (bottleneck): Sequential(
      (0): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (7): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
# Look at the second layer of the FPN, C3, which contains four Bottleneck s
  (0): Bottleneck(
    (bottleneck): Sequential(
      (0): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (downsample): Sequential(
      (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
      (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (1): Bottleneck(
    (bottleneck): Sequential(
      (0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
  (2): Bottleneck(
    (bottleneck): Sequential(
      (0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
  (3): Bottleneck(
    (bottleneck): Sequential(
      (0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (7): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)

Keywords: Pytorch Computer Vision Deep Learning Convolutional Neural Networks Object Detection

