[deep learning calculation] access, initialization and sharing of model parameters

Access, initialization and sharing of model parameters

In (a concise implementation of linear regression), we initialize the parameters of the model through the init module. We also introduced a simple way to access model parameters. This section will explain in depth how to access and initialize model parameters, and how to share the same model parameters among multiple layers.

We first define a multi-layer perceptron with a single hidden layer as in the previous section. We still use the default method to initialize its parameters and do a forward calculation. Unlike before, here we import the init module from nn, which contains a variety of model initialization methods.

import torch
from torch import nn
from torch.nn import init

net = nn.Sequential(nn.Linear(4, 3), nn.ReLU(), nn.Linear(3, 1))  # pytorch has been initialized by default

print(net)
X = torch.rand(2, 4)
Y = net(X).sum()

Output:

Sequential(
  (0): Linear(in_features=4, out_features=3, bias=True)
  (1): ReLU()
  (2): Linear(in_features=3, out_features=1, bias=True)
)
tensor(0.4694, grad_fn=<SumBackward0>)

Access model parameters

Recall the inheritance relationship between Sequential class and Module class mentioned in the previous section. For the layer with model parameters in the Sequential instance, we can use parameters() or named of the Module class_ The parameters method to access all parameters (returned as iterators), which returns their names in addition to the parameter Tensor. Next, access all parameters of the multi-layer perceptron net:

print(type(net.named_parameters()))
for name, param in net.named_parameters():
    print(name, param, param.size())

Output:

<class 'generator'>
0.weight Parameter containing:
tensor([[-0.1598, -0.4550,  0.2139,  0.2397],
        [ 0.0132,  0.1468,  0.2076,  0.0626],
        [ 0.0601,  0.2425,  0.1341, -0.1577]], requires_grad=True) torch.Size([3, 4])
0.bias Parameter containing:
tensor([-0.1073, -0.2686,  0.0367], requires_grad=True) torch.Size([3])
2.weight Parameter containing:
tensor([[ 0.5067, -0.0936, -0.2661]], requires_grad=True) torch.Size([1, 3])
2.bias Parameter containing:
tensor([0.2974], requires_grad=True) torch.Size([1])

It can be seen that the returned name is automatically prefixed with the index of the number of layers.
Let's access the parameters of a single layer in net. For neural networks constructed using Sequential class, we can access any layer of the network through square brackets []. Index 0 indicates that the hidden layer is the first layer added by the Sequential instance.

for name, param in net[0].named_parameters():
    print(name, param.size(), type(param))

Output:

weight torch.Size([3, 4]) <class 'torch.nn.parameter.Parameter'>
bias torch.Size([3]) <class 'torch.nn.parameter.Parameter'>

Because it is single-layer, there is no prefix for the layer index. In addition, the type of param returned is torch nn. parameter. Parameter is actually a subclass of Tensor. Unlike Tensor, if a Tensor is a parameter, it will be automatically added to the parameter list of the model. Take the following example.

class MyModel(nn.Module):
    def __init__(self, **kwargs):
        super(MyModel, self).__init__(**kwargs)
        self.weight1 = nn.Parameter(torch.rand(20, 20))
        self.weight2 = torch.rand(20, 20)
    def forward(self, x):
        pass
    
n = MyModel()
for name, param in n.named_parameters():
    print(name)

Output:

weight1

In the above code, weight1 is in the parameter list, but weight2 is not in the parameter list.

Because Parameter is Tensor, that is, it has all the attributes owned by Tensor. For example, you can access the Parameter value according to data and use grad to access the Parameter gradient.

weight_0 = list(net[0].parameters())[0]
print(weight_0.data)
print(weight_0.grad) # The gradient before back propagation is None
Y.backward()
print(weight_0.grad)

Output:

tensor([[ 0.2719, -0.0898, -0.2462,  0.0655],
        [-0.4669, -0.2703,  0.3230,  0.2067],
        [-0.2708,  0.1171, -0.0995,  0.3913]])
None
tensor([[-0.2281, -0.0653, -0.1646, -0.2569],
        [-0.1916, -0.0549, -0.1382, -0.2158],
        [ 0.0000,  0.0000,  0.0000,  0.0000]])

Initialize model parameters

We mentioned NN in PyTorch in section 3.15 (numerical stability and model initialization) The module parameters of the module adopt a more reasonable initialization strategy (the specific initialization method sampled by different types of layer s can be referred to) source code ). But we often need to use other methods to initialize weights.

The init module of PyTorch provides a variety of preset initialization methods. In the following example, we initialize the weight parameter to a normally distributed random number with a mean of 0 and a standard deviation of 0.01, and still clear the deviation parameter.

for name, param in net.named_parameters():
    if 'weight' in name:
        init.normal_(param, mean=0, std=0.01)
        print(name, param.data)

Output:

0.weight tensor([[ 0.0030,  0.0094,  0.0070, -0.0010],
        [ 0.0001,  0.0039,  0.0105, -0.0126],
        [ 0.0105, -0.0135, -0.0047, -0.0006]])
2.weight tensor([[-0.0074,  0.0051,  0.0066]])

Next, use constants to initialize the weight parameters.

for name, param in net.named_parameters():
    if 'bias' in name:
        init.constant_(param, val=0)
        print(name, param.data)

Output:

0.bias tensor([0., 0., 0.])
2.bias tensor([0.])

Custom initialization method

Sometimes the initialization method we need is not provided in the init module. At this point, an initialization method can be implemented so that it can be used like other initialization methods. Before that, let's take a look at how PyTorch implements these initialization methods, such as torch nn. init. normal_:

def normal_(tensor, mean=0, std=1):
    with torch.no_grad():
        return tensor.normal_(mean, std)

You can see that this is a function of inplace changing the Tensor value, and the gradient is not recorded in this process.

Similarly, let's implement a custom initialization method. In the following example, we initialize half the probability of the weight to 0 and the other half to 0 [ − 10 , − 5 ] [-10,-5] [− 10, − 5] and [ 5 , 10 ] [5,10] [5,10] random numbers uniformly distributed in two intervals.

def init_weight_(tensor):
    with torch.no_grad():
        tensor.uniform_(-10, 10)
        tensor *= (tensor.abs() >= 5).float()

for name, param in net.named_parameters():
    if 'weight' in name:
        init_weight_(param)
        print(name, param.data)

Output:

0.weight tensor([[ 7.0403,  0.0000, -9.4569,  7.0111],
        [-0.0000, -0.0000,  0.0000,  0.0000],
        [ 9.8063, -0.0000,  0.0000, -9.7993]])
2.weight tensor([[-5.8198,  7.7558, -5.0293]])

In addition, referring to section 2.3.2, we can also rewrite the model parameter values by changing the data of these parameters without affecting the gradient:

for name, param in net.named_parameters():
    if 'bias' in name:
        param.data += 1
        print(name, param.data)

Output:

0.bias tensor([1., 1., 1.])
2.bias tensor([1.])

Shared model parameters

In some cases, we want to share model parameters among multiple layers. Section 4.1.3 mentioned how to share model parameters: the forward function of Module class calls the same layer multiple times. In addition, if the Module we pass in Sequential is the same Module instance, the parameters are also shared. Here is an example:

linear = nn.Linear(1, 1, bias=False)
net = nn.Sequential(linear, linear) 
print(net)
for name, param in net.named_parameters():
    init.constant_(param, val=3)
    print(name, param.data)

Output:

Sequential(
  (0): Linear(in_features=1, out_features=1, bias=False)
  (1): Linear(in_features=1, out_features=1, bias=False)
)
0.weight tensor([[3.]])

In memory, these two linear layers are actually one object:

print(id(net[0]) == id(net[1]))
print(id(net[0].weight) == id(net[1].weight))

Output:

True
True

Because the model parameters contain gradients, the gradients of these shared parameters are cumulative during back propagation calculation:

x = torch.ones(1, 1)
y = net(x).sum()
print(y)
y.backward()
print(net[0].weight.grad)  # The single gradient is 3, twice, so it's 6

Output:

tensor(9., grad_fn=<SumBackward0>)
tensor([[6.]])

Summary

  • There are several ways to access, initialize, and share model parameters.
  • You can customize the initialization method.

Note: this section is different from this section of the original book, Original book portal

For the purpose of learning, I quote the content of this book for non-commercial purposes. I recommend you to read this book and study together!!!

come on.

thank!

strive!

Keywords: neural networks Pytorch Deep Learning

Added by ssj4gogita4 on Thu, 06 Jan 2022 01:42:16 +0200