Network model construction

Network model construction of deep learning

1, Inherit Module class construction model

Module class is a model construction class provided in nn module. It is the base class of all neural network modules. We can inherit it to define the model we want. The following inherits the module class to construct the multi-layer perceptron mentioned at the beginning of this section. The MLP class defined here overloads the module class__ init__ Function and forward function. They are used to create model parameters and define forward calculations, respectively. Forward computation is also called forward propagation.

import torch
from torch import nn
class MLP(nn.Module):
    # Declare a layer with model parameters, where two fully connected layers are declared
    def __init__(self, **kwargs):
        # Call the constructor of the MLP parent class Module to perform the necessary initialization. In this way, other functions can be specified when constructing the instance
        # Parameters, such as the model parameter params described in the "access, initialization and sharing of model parameters" section
        super(MLP, self).__init__(**kwargs)
        self.hidden = nn.Linear(784, 256) # Hidden layer
        self.act = nn.ReLU()
        self.output = nn.Linear(256, 10)  # Output layer
    # Define the forward calculation of the model, that is, how to calculate and return the required model output according to the input x
    def forward(self, x):
        a = self.act(self.hidden(x))
        return self.output(a)

You can instantiate the MLP class to get the model variable net. The following code initializes net and passes in the input data X for a forward calculation. Among them, net(X) will call the MLP inherited from the Module class__ call__ Function, which will call the forward function defined by the MLP class to complete the forward calculation.

X = torch.rand(2, 784)
net = MLP()

2, Module subclass

1. Sequential class

When the forward calculation of the model is the calculation of simply concatenating each layer, the Sequential class can define the model in a simpler way. This is the purpose of the Sequential class: it can receive the ordered Dictionary of a sub Module or a series of sub modules as parameters to add Module instances one by one, and the forward calculation of the model is to calculate these instances one by one in the order of addition. For example, the following Alex net construction example:

class AlexNet(nn.Module):  # Training ALexNet
    5 Layer convolution, 3-layer full connection
    def __init__(self):
        super(AlexNet, self).__init__()
        # Five convolution inputs 32 * 32 * 3
        self.conv1 = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=1),  # (32-3+2)/1+1 = 32
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)  # (32-2)/2+1 = 16
        self.conv2 = nn.Sequential(  # Enter 16 * 16 * 6
            nn.Conv2d(in_channels=6, out_channels=16, kernel_size=3, stride=1, padding=1),  # (16-3+2)/1+1 = 16
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)  # (16-2)/2+1 = 8
        self.conv3 = nn.Sequential(  # Enter 8 * 8 * 16
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding=1),  # (8-3+2)/1+1 = 8
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)  # (8-2)/2+1 = 4
        self.conv4 = nn.Sequential(  # Enter 4 * 4 * 64
            nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1),  # (4-3+2)/1+1 = 4
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0)  # (4-2)/2+1 = 2
        self.conv5 = nn.Sequential(  # Input 2 * 2 * 128
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1),  # (2-3+2)/1+1 = 2
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0),  # (2-2)/2+1 = 1
            # nn.Flatten()
        )  # Last convolution layer, output 1 * 1 * 128
        # Full connection layer
        self.dense = nn.Sequential(
            nn.Linear(128, 120),
            # nn.Dropout(),
            nn.Linear(120, 84),
            # nn.Dropout(),
            nn.Linear(84, 10),
            # nn.ReLU(),
            # nn.Softmax()
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        x = self.conv4(x)
        x = self.conv5(x)
        x = x.view(x.size()[0], -1)
        x = self.dense(x)
        return x
    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.constant_(m.bias, 0)

2. ModuleList class

ModuleList receives a List of sub modules as input, and then can perform append and extend operations like List:

net = nn.ModuleList([nn.Linear(784, 256), nn.ReLU()])
net.append(nn.Linear(256, 10)) # # append operation similar to List
print(net[-1])  # Index access similar to List

Output is:

Linear(in_features=256, out_features=10, bias=True)
  (0): Linear(in_features=784, out_features=256, bias=True)
  (1): ReLU()
  (2): Linear(in_features=256, out_features=10, bias=True)

The difference between Sequential and ModuleList:

① ModuleList is just a list for storing various modules. There is no connection or order between these modules (so there is no need to ensure that the input and output dimensions of adjacent layers match, and the forward function is not implemented. You need to implement it yourself, so the above implementation of net(torch.zeros(1, 784)) will report 'NotImplementedError'.

② The modules in Sequential need to be arranged in order. To ensure that the input and output sizes of adjacent layers match, the internal forward function has been realized.

a. Example 1 of flexible application of ModuleList

The emergence of ModuleList only makes the forward propagation of network definition more flexible

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])

    def forward(self, x):
        # ModuleList can act as an iterable, or be indexed using ints
        for i, l in enumerate(self.linears):
            x = self.linears[i // 2](x) + l(x)
        return x

b. Example 2 of flexible application of ModuleList

ModuleList is different from the general Python list. The parameters of all modules added to ModuleList will be automatically added to the whole network

class Module_ModuleList(nn.Module):
    def __init__(self):
        super(Module_ModuleList, self).__init__()
        self.linears = nn.ModuleList([nn.Linear(10, 10)])

class Module_List(nn.Module):
    def __init__(self):
        super(Module_List, self).__init__()
        self.linears = [nn.Linear(10, 10)]

net1 = Module_ModuleList()
net2 = Module_List()

for p in net1.parameters():

for p in net2.parameters():

Output is:

torch.Size([10, 10])

3. ModuleDict class

ModuleDict receives the dictionary of a sub module as input, and then can add and access like a dictionary:

net = nn.ModuleDict({
    'linear': nn.Linear(784, 256),
    'act': nn.ReLU(),
net['output'] = nn.Linear(256, 10) # add to
print(net['linear']) # visit

Output is:

Linear(in_features=784, out_features=256, bias=True)
Linear(in_features=256, out_features=10, bias=True)
  (act): ReLU()
  (linear): Linear(in_features=784, out_features=256, bias=True)
  (output): Linear(in_features=256, out_features=10, bias=True)

Similarities between ModuleDict and ModuleList

① : like ModuleList, the ModuleDict instance only stores the dictionaries of some modules and does not define the forward function, which needs to be defined by itself.

② : ModuleDict is also different from Python's Dict. The parameters of all modules in ModuleDict will be automatically added to the whole network.

3, Build your own complex model

1. Construct your own model FancyMLP

A slightly more complex network FancyMLP is constructed below. In this network, we get_constant function creates parameters that are not iterated in training, i.e. constant parameters. In the forward calculation, in addition to using the constant parameters created, we also use Tensor's function and Python's control flow, and call the same layer many times.

class FancyMLP(nn.Module):
    def __init__(self, **kwargs):
        super(FancyMLP, self).__init__(**kwargs)

        self.rand_weight = torch.rand((20, 20), requires_grad=False) # Untrainable parameter (constant parameter)
        self.linear = nn.Linear(20, 20)

    def forward(self, x):
        x = self.linear(x)
        # Use the constant parameter created, and NN relu function and mm function in functional
        x = nn.functional.relu(, + 1)

        # Reuse full connection layer. Equivalent to two full connection layers sharing parameters
        x = self.linear(x)
        # Control flow. Here, we need to call the item function to return scalar for comparison
        while x.norm().item() > 1: #norm() is used to find the second norm; item() function: take out the element value of the single element tensor and return the value, keeping the original element type unchanged.
            x /= 2
        if x.norm().item() < 0.8:
            x *= 10
        return x.sum()
X = torch.rand(2, 20)
net = FancyMLP()

Output is:

  (linear): Linear(in_features=20, out_features=20, bias=True)
tensor(0.8432, grad_fn=<SumBackward0>)

2. Using the fancyMLP nesting constructed above, a new network NestMLP is constructed

Because FancyMLP and Sequential classes are subclasses of Module class, they can be nested and called.

class NestMLP(nn.Module):
    def __init__(self, **kwargs):
        super(NestMLP, self).__init__(**kwargs) = nn.Sequential(nn.Linear(40, 30), nn.ReLU()) 

    def forward(self, x):

net = nn.Sequential(NestMLP(), nn.Linear(30, 20), FancyMLP())

X = torch.rand(2, 40)


  (0): NestMLP(
    (net): Sequential(
      (0): Linear(in_features=40, out_features=30, bias=True)
      (1): ReLU()
  (1): Linear(in_features=30, out_features=20, bias=True)
  (2): FancyMLP(
    (linear): Linear(in_features=20, out_features=20, bias=True)
tensor(14.4908, grad_fn=<SumBackward0>)


  • You can construct the model by inheriting the Module class.
  • Sequential, ModuleList and ModuleDict classes all inherit from the Module class.
  • Unlike Sequential, ModuleList and ModuleDict do not define a complete network. They just store different modules together and need to define their own forward function.
  • Although Sequential and other classes can make the model construction simpler, directly inheriting the Module class can greatly expand the flexibility of model construction.

Keywords: network neural networks Deep Learning

Added by e7gaskell on Sat, 19 Feb 2022 06:56:16 +0200