PyTorch has some basic concepts that are important when building networks, such as NN Module, nn.ModuleList, nn.Sequential, these classes are called containers because we can add modules to them. These containers are easily confused. In this article, we mainly study NN Modulelist and NN Sequential, and judge when to use which one is more appropriate. The example in this article uses PyTorch version 1.0.
nn.ModuleList
First, let's talk about NN Modulelist this class, you can put any NN Subclasses of module (such as nn.Conv2d, nn.Linear, etc.) are added to this list. The method is the same as Python's own list, which is nothing more than extend, append and other operations. But different from the general list, add to NN The module in the modulelist will be registered in the whole network, and the parameters of the module will be automatically added to the whole network. The description looks boring. Let's look at a few examples.
The first network, let's take a look at using NN Modulelist to build a small network, including three full connection layers:
class net1(nn.Module): def __init__(self): super(net1, self).__init__() self.linears = nn.ModuleList([nn.Linear(10,10) for i in range(2)]) def forward(self, x): for m in self.linears: x = m(x) return x net = net1() print(net) # net1( # (modules): ModuleList( # (0): Linear(in_features=10, out_features=10, bias=True) # (1): Linear(in_features=10, out_features=10, bias=True) # ) # ) for param in net.parameters(): print(type(param.data), param.size()) # <class 'torch.Tensor'> torch.Size([10, 10]) # <class 'torch.Tensor'> torch.Size([10]) # <class 'torch.Tensor'> torch.Size([10, 10]) # <class 'torch.Tensor'> torch.Size([10])
We can see that this network consists of two full connection layers, and their weight (weithgs) and bias (bias) are within this network. Next, let's look at the second network, which uses Python's own list:
class net2(nn.Module): def __init__(self): super(net2, self).__init__() self.linears = [nn.Linear(10,10) for i in range(2)] def forward(self, x): for m in self.linears: x = m(x) return x net = net2() print(net) # net2() print(list(net.parameters())) # []
Obviously, the full connection layers and their parameters added using Python's list are not automatically registered in our network. Of course, we can still use forward to calculate the output. However, if the network instantiated by net2 is used for training, because the parameters of these layers are not in the whole network, the network parameters will not be updated.
OK, see here, we generally understand NN What does modulelist do: it is a container that stores different modules and automatically adds the parameters of each module to the network. However, we need to note that NN Modulelist does not define a network. It just stores different modules together. There is no order between these modules, such as:
class net3(nn.Module): def __init__(self): super(net3, self).__init__() self.linears = nn.ModuleList([nn.Linear(10,20), nn.Linear(20,30), nn.Linear(5,10)]) def forward(self, x): x = self.linears[2](x) x = self.linears[0](x) x = self.linears[1](x) return x net = net3() print(net) # net3( # (linears): ModuleList( # (0): Linear(in_features=10, out_features=20, bias=True) # (1): Linear(in_features=20, out_features=30, bias=True) # (2): Linear(in_features=5, out_features=10, bias=True) # ) # ) input = torch.randn(32, 5) print(net(input).shape) # torch.Size([32, 30])
According to the results of net3, we can see that the order in the ModuleList does not determine anything. The execution order of the network is determined according to the forward function. If you insist that the order of ModuleList and forward is different, PyTorch means it doesn't matter, but people who review your code may have a big opinion in the future.
Let's consider another case. Since the ModuleList can be called according to the sequence number, can a module be called multiple times in the forward function? Of course, the answer is yes, but the modules called multiple times use the same set of parameters, that is, their parameters are exactly the same, no matter how you update them later. The example is as follows, although we use NN in forward Linear (10,10) twice, but they have only one set of parameters. What's the use of doing this? I don't think of it at present
class net4(nn.Module): def __init__(self): super(net4, self).__init__() self.linears = nn.ModuleList([nn.Linear(5, 10), nn.Linear(10, 10)]) def forward(self, x): x = self.linears[0](x) x = self.linears[1](x) x = self.linears[1](x) return x net = net4() print(net) # net4( # (linears): ModuleList( # (0): Linear(in_features=5, out_features=10, bias=True) # (1): Linear(in_features=10, out_features=10, bias=True) # ) # ) for name, param in net.named_parameters(): print(name, param.size()) # linears.0.weight torch.Size([10, 5]) # linears.0.bias torch.Size([10]) # linears.1.weight torch.Size([10, 10]) # linears.1.bias torch.Size([10])
nn.Sequential
Now let's study NN Sequential, different from NN Modulelist, which has implemented the forward function, and the modules in it are arranged in order, so we must ensure that the output size of the previous module is consistent with the input size of the next module, as shown in the following example:
class net5(nn.Module): def __init__(self): super(net5, self).__init__() self.block = nn.Sequential(nn.Conv2d(1,20,5), nn.ReLU(), nn.Conv2d(20,64,5), nn.ReLU()) def forward(self, x): x = self.block(x) return x net = net5() print(net) # net5( # (block): Sequential( # (0): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1)) # (1): ReLU() # (2): Conv2d(20, 64, kernel_size=(5, 5), stride=(1, 1)) # (3): ReLU() # ) # )
Here are two initialization examples from the official website tutorial. In the second initialization, we use OrderedDict to specify the name of each module instead of the default naming method (by sequence number 0,1,2,3...).
# Example of using Sequential model1 = nn.Sequential( nn.Conv2d(1,20,5), nn.ReLU(), nn.Conv2d(20,64,5), nn.ReLU() ) print(model1) # Sequential( # (0): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1)) # (1): ReLU() # (2): Conv2d(20, 64, kernel_size=(5, 5), stride=(1, 1)) # (3): ReLU() # ) # Example of using Sequential with OrderedDict import collections model2 = nn.Sequential(collections.OrderedDict([ ('conv1', nn.Conv2d(1,20,5)), ('relu1', nn.ReLU()), ('conv2', nn.Conv2d(20,64,5)), ('relu2', nn.ReLU()) ])) print(model2) # Sequential( # (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1)) # (relu1): ReLU() # (conv2): Conv2d(20, 64, kernel_size=(5, 5), stride=(1, 1)) # (relu2): ReLU() # )
Students may have found that, eh, is there any difference between your model1 and net instantiated from class net5? No. The two networks are the same because NN Sequential is a NN A subclass of module, that is, NN Module has all methods. And directly use NN Sequential does not need to write the forward function, because it has been written internally for you.
At this time, some students should say, since NN Sequential is so good that I will use it directly in the future. If you're sure NN The order in sequential is what you want, and you don't need to add some other processing functions (such as the function in nn.functional, what's the difference between NN and nn.functional?), Then you can directly use NN Sequential. The cost of doing so is to lose some flexibility. After all, you can't customize the contents of the forward function yourself.
In general, NN Sequential is used to form convolution blocks, and then assemble different blocks into the whole network like building blocks, making the code more concise and structured.
nn.ModuleList and NN Sequential: which one should I use
We have briefly introduced these two classes. Now let's discuss which one is more appropriate in two different scenarios.
Scenario 1: sometimes there are many similar or repeated layers in the network. We will generally consider creating them with a for loop, such as:
class net6(nn.Module): def __init__(self): super(net6, self).__init__() self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(3)]) def forward(self, x): for layer in self.linears: x = layer(x) return x net = net6() print(net) # net6( # (linears): ModuleList( # (0): Linear(in_features=10, out_features=10, bias=True) # (1): Linear(in_features=10, out_features=10, bias=True) # (2): Linear(in_features=10, out_features=10, bias=True) # ) # )
This is a general method, but if we don't want to be so troublesome, we can also use Sequential, as shown in net7! Note the * operator, which can split a list into independent elements. So in scenario 1, I personally think it is more convenient and tidy to use net7
class net7(nn.Module): def __init__(self): super(net7, self).__init__() self.linear_list = [nn.Linear(10, 10) for i in range(3)] self.linears = nn.Sequential(*self.linears_list) def forward(self, x): self.x = self.linears(x) return x net = net7() print(net) # net7( # (linears): Sequential( # (0): Linear(in_features=10, out_features=10, bias=True) # (1): Linear(in_features=10, out_features=10, bias=True) # (2): Linear(in_features=10, out_features=10, bias=True) # ) # )
Let's consider scenario 2. When we need the information of the previous layer, such as the shortcut structure in ResNets or the skip architecture used in FCN, the results of the current layer need to be integrated with the results of the previous layer. Generally, it is more convenient to use ModuleList. A very simple example is as follows:
class net8(nn.Module): def __init__(self): super(net8, self).__init__() self.linears = nn.ModuleList([nn.Linear(10, 20), nn.Linear(20, 30), nn.Linear(30, 50)]) self.trace = [] def forward(self, x): for layer in self.linears: x = layer(x) self.trace.append(x) return x net = net8() input = torch.randn(32, 10) output = net(input) for each in net.trace: print(each.shape) # torch.Size([32, 20]) # torch.Size([32, 30]) # torch.Size([32, 50])
We use a trace list to store the output results of each layer of the network, so that it can be easily called if the later layer needs to be used.
summary
In this article, we learned the two nn containers of ModuleList and Sequential through some examples. ModuleList is a list that stores various modules. These modules have no connection and do not realize the forward function. However, compared with ordinary Python list, ModuleList can automatically register the modules and parameters added to it on the network. The modules in Sequential need to be arranged in order. To ensure that the input and output sizes of adjacent layers match, the internal forward function has been realized, which can make the code cleaner. In different scenarios, if both are applicable, it depends on personal preferences. It is highly recommended that you look at the model implementation code under PyTorch's official TorchVision, and you can learn a lot of network construction skills.