introduction
Initially, we will initialize the parameters of the model with the init module. Now we will learn more about how to access and initialize model parameters, and how to share the same model parameters among multiple layers.
We first define a multi-layer perceptron with a single hidden layer. We still use the default method to initialize its parameters and do a forward calculation. Unlike before, here we import the init module from nn, which contains a variety of model initialization methods.
import torch from torch import nn from torch.nn import init net = nn.Sequential(nn.Linear(4, 3), nn.ReLU(), nn.Linear(3, 1)) # pytorch has been initialized by default print(net) X = torch.rand(2, 4) Y = net(X).sum()
Sequential( (0): Linear(in_features=4, out_features=3, bias=True) (1): ReLU() (2): Linear(in_features=3, out_features=1, bias=True) )
Access parameters
For the layer with model parameters in the Sequential instance, we can use parameters() or named of the Module class_ Parameters method to access all parameters (returned in the form of iterator), which returns the name in addition to the parameter Tensor. Next, access all parameters of multi-layer perceptron net:
print(type(net.named_parameters())) for name, param in net.named_parameters(): print(name, param.size())
<class 'generator'> 0.weight torch.Size([3, 4]) 0.bias torch.Size([3]) 2.weight torch.Size([1, 3]) 2.bias torch.Size([1])
It can be seen that the returned name is automatically prefixed with the index of the number of layers. Let's access the parameters of a single layer in net. For neural networks constructed using Sequential class, we can access any layer of the network through square brackets []. Index 0 indicates that the hidden layer is the first layer added by the Sequential instance.
for name, param in net[0].named_parameters(): print(name, param.size(), type(param))
weight torch.Size([3, 4]) <class 'torch.nn.parameter.Parameter'> bias torch.Size([3]) <class 'torch.nn.parameter.Parameter'>
Because it is single-layer, there is no prefix for the layer index. In addition, the type of param returned is torch nn. parameter. Parameter is actually a subclass of Tensor. Unlike Tensor, if a Tensor is a parameter, it will be automatically added to the parameter list of the model. Take the following example
class MyModel(nn.Module): def __init__(self, **kwargs): super(MyModel, self).__init__(**kwargs) self.weight1 = nn.Parameter(torch.rand(20, 20)) self.weight2 = torch.rand(20, 20) def forward(self, x): pass n = MyModel() for name, param in n.named_parameters(): print(name)
weight1
In the above code, weight1 is in the parameter list, but weight2 is not in the parameter list.
Because Parameter is Tensor, that is, it has all the attributes owned by Tensor. For example, you can access the Parameter value according to data and use grad to access the Parameter gradient.
weight_0 = list(net[0].parameters())[0] print(weight_0.data) print(weight_0.grad) # The gradient before back propagation is None Y.backward() print(weight_0.grad)
tensor([[ 0.1180, 0.3797, -0.2047, 0.0139], [-0.0648, -0.1981, 0.3251, 0.4002], [-0.4001, 0.3944, 0.4709, 0.3300]]) None tensor([[-0.2324, -0.1659, -0.1951, -0.1850], [ 0.5559, 0.3967, 0.4667, 0.4426], [ 0.2842, 0.2733, 0.3751, 0.2993]])
Initialize model parameters
NN in PyTorch The module parameters of module adopt a more reasonable initialization strategy (refer to the source code for the specific initialization method of different types of layer s). However, we often need to use other methods to initialize the weight. The init module of PyTorch provides a variety of preset initialization methods. In the following example, we initialize the weight parameter to a normally distributed random number with a mean value of 0 and a standard deviation of 0.01, and still use the deviation parameter The number is cleared.
for name, param in net.named_parameters(): if 'weight' in name: init.normal_(param, mean=0, std=0.01) print(name, param.data)
0.weight tensor([[-0.0005, -0.0229, 0.0169, 0.0075], [ 0.0284, 0.0064, 0.0019, 0.0157], [-0.0126, 0.0090, 0.0001, -0.0083]]) 2.weight tensor([[-0.0176, -0.0042, -0.0045]])
Next, use constants to initialize the weight parameters.
for name, param in net.named_parameters(): if 'bias' in name: init.constant_(param, val=0) print(name, param.data)
0.bias tensor([0., 0., 0.]) 2.bias tensor([0.])
Custom initialization method
Sometimes the initialization method we need is not provided in the init module. At this point, an initialization method can be implemented so that it can be used like other initialization methods. Before that, let's take a look at how PyTorch implements these initialization methods, such as torch nn. init. normal_:
def normal_(tensor, mean=0, std=1): with torch.no_grad(): return tensor.normal_(mean, std)
You can see that this is a function of inplace changing the Tensor value, and the gradient is not recorded in this process. Similarly, let's implement a custom initialization method. In the following example, we initialize half the probability of the weight to 0 and the other half to a random number evenly distributed in the two intervals [− 10, − 5] [− 10, − 5] and [5,10] [5,10].
def init_weight_(tensor): with torch.no_grad(): tensor.uniform_(-10, 10) tensor *= (tensor.abs() >= 5).float() for name, param in net.named_parameters(): if 'weight' in name: init_weight_(param) print(name, param.data)
0.weight tensor([[-0.0000, 0.0000, 0.0000, -5.4827], [ 7.5929, 6.3077, 0.0000, -7.3286], [ 0.0000, -6.8615, 9.8957, -0.0000]]) 2.weight tensor([[0.0000, 0.0000, 5.7279]])
In addition, we can rewrite the model parameter values by changing the data of these parameters without affecting the gradient:
for name, param in net.named_parameters(): if 'bias' in name: param.data += 1 print(name, param.data)
0.bias tensor([1., 1., 1.]) 2.bias tensor([1.])
Shared model parameters
In some cases, we want to share model parameters among multiple layers. Shared model parameters: the forward function of Module class calls the same layer multiple times. In addition, if the Module we pass in Sequential is the same Module instance, the parameters are also shared. Here is an example:
linear = nn.Linear(1, 1, bias=False) net = nn.Sequential(linear, linear) print(net) for name, param in net.named_parameters(): init.constant_(param, val=3) print(name, param.data)
Sequential( (0): Linear(in_features=1, out_features=1, bias=False) (1): Linear(in_features=1, out_features=1, bias=False) ) 0.weight tensor([[3.]])
In memory, these two linear layers are actually one object:
print(id(net[0]) == id(net[1])) print(id(net[0].weight) == id(net[1].weight))
True True
Because the model parameters contain gradients, the gradients of these shared parameters are cumulative during back propagation calculation:
x = torch.ones(1, 1) y = net(x).sum() print(y) y.backward() print(net[0].weight.grad) # The single gradient is 3, twice, so it's 6
tensor(9., grad_fn=<SumBackward0>) tensor([[6.]])
Summary
- There are several ways to access, initialize, and share model parameters.
- You can customize the initialization method.