# Learning Summary

(1) Each optimizer is a class and must be instantiated before it can be used, for example:

```class Net(nn.Moddule):
···
net = Net()
optim = torch.optim.SGD(net.parameters(), lr=lr)
optim.step()
```

(2) optimizer implements the following two steps in epoch of a neural network:

```optimizer = torch.optim.SGD(net.parameters(), lr=1e-5)
for epoch in range(EPOCH):
...
loss = ...             #Calculating loss
loss.backward()        #BP Reverse Propagation
```

# 1. Optimizer

The goal of in-depth learning is to change the network parameters continuously so that the parameters can fit the output of various non-linear transformations to the input. Essentially, it is a function to find the optimal solution, but the optimal solution is a matrix. How to find the optimal solution quickly is a key point of in-depth learning research - Taking the classic resnet-50 as an example, it has about 20 million coefficients.There are two ways in which we can calculate so many coefficients if we need to calculate them:

(1) The first is the most direct parameter of violent exhaustion, which has a basic possibility of implementation of 0, which is comparable to the difficulty of moving the mountain plus by fools.
(2) In order to solve the parameters more quickly, a second method is proposed, that is, approximate solution by BP+optimizer.

Therefore, the optimizer updates the parameters of the network based on the gradient information of the network's reverse propagation to reduce the loss function calculation value and make the model output closer to the real label.

# 2. Optimizer for Pytorch

Pytorch provides a library of optimizers, torch.optim, where ten are provided.

• torch.optim.ASGD
• torch.optim.LBFGS
• torch.optim.RMSprop
• torch.optim.Rprop
• torch.optim.SGD

These optimization algorithms are all inherited from Optimizer, so let's first look at the base class Optimizer for all of them. Definitions are as follows:

```class Optimizer(object):
def __init__(self, params, defaults):
self.defaults = defaults
self.state = defaultdict(dict)
self.param_groups = []
```

Optimizer has three properties:

• defaults: Stores the optimizer's superparameters, as shown below:
```{'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False}
```
• state: Cache of parameters, as shown below
```defaultdict(<class 'dict'>, {tensor([[ 0.3864, -0.0131],
[-0.1911, -0.4511]], requires_grad=True): {'momentum_buffer': tensor([[0.0052, 0.0052],
[0.0052, 0.0052]])}})
```
• param_groups: The managed parameter group is a list, where each element is a dictionary, in the order params, lr, momentum, dampening, weight_decay, nesterov, for example
```[{'params': [tensor([[-0.1022, -1.6890],
[-1.5116, -1.7846]],
'lr': 1,
'momentum': 0,
'dampening': 0,
'weight_decay': 0,
'nesterov': False}]
```

Optimizer also has the following methods:

• zero_grad(): Clears the gradient of the managed parameter. Pytorch is characterized by that the gradient of the tensor is not automatically zeroed out, so the gradient needs to be cleared after each backward propagation.
```def zero_grad(self, set_to_none: bool = False):
for group in self.param_groups:
for p in group['params']:
if set_to_none:
else:
else:
```
• step(): perform a one-step gradient update, parameter update
```def step(self, closure):
raise NotImplementedError
```
```def add_param_group(self, param_group):
assert isinstance(param_group, dict), "param group must be a dict"
# Check if the type is tensor
params = param_group['params']
if isinstance(params, torch.Tensor):
param_group['params'] = [params]
elif isinstance(params, set):
raise TypeError('optimizer parameters need to be organized in ordered collections, but '
'the ordering of tensors in sets will change between runs. Please use a list instead.')
else:
param_group['params'] = list(params)
for param in param_group['params']:
if not isinstance(param, torch.Tensor):
raise TypeError("optimizer can only optimize Tensors, "
"but one of the params is " + torch.typename(param))
if not param.is_leaf:
raise ValueError("can't optimize a non-leaf Tensor")

for name, default in self.defaults.items():
if default is required and name not in param_group:
raise ValueError("parameter group didn't specify a value of required optimization parameter " +
name)
else:
param_group.setdefault(name, default)

params = param_group['params']
if len(params) != len(set(params)):
warnings.warn("optimizer contains a parameter group with duplicate parameters; "
"in future, this will cause an error; "
# It looks like all of these classes are being detected and reported to Warning and Error
param_set = set()
for group in self.param_groups:
param_set.update(set(group['params']))

if not param_set.isdisjoint(set(param_group['params'])):
raise ValueError("some parameters appear in more than one parameter group")
self.param_groups.append(param_group)
```
• load_state_dict(): Load a dictionary of state parameters, which can be used for intermittent training of models, to continue with the last parameter training
```def load_state_dict(self, state_dict):

Arguments:
state_dict (dict): optimizer state. Should be an object returned
from a call to :meth:`state_dict`.
"""
# deepcopy, to be consistent with module API
state_dict = deepcopy(state_dict)
# Validate the state_dict
groups = self.param_groups
saved_groups = state_dict['param_groups']

if len(groups) != len(saved_groups):
raise ValueError("loaded state dict has a different number of "
"parameter groups")
param_lens = (len(g['params']) for g in groups)
saved_lens = (len(g['params']) for g in saved_groups)
if any(p_len != s_len for p_len, s_len in zip(param_lens, saved_lens)):
raise ValueError("loaded state dict contains a parameter group "
"that doesn't match the size of optimizer's group")

# Update the state
id_map = {old_id: p for old_id, p in
zip(chain.from_iterable((g['params'] for g in saved_groups)),
chain.from_iterable((g['params'] for g in groups)))}

def cast(param, value):
r"""Make a deep copy of value, casting all tensors to device of param."""
.....

# Copy state assigned to params (and cast tensors to appropriate types).
# State that is not assigned to params is copied as is (needed for
# backward compatibility).
state = defaultdict(dict)
for k, v in state_dict['state'].items():
if k in id_map:
param = id_map[k]
state[param] = cast(param, v)
else:
state[k] = v

# Update parameter groups, setting their 'params' value
def update_group(group, new_group):
...
param_groups = [
update_group(g, ng) for g, ng in zip(groups, saved_groups)]
self.__setstate__({'state': state, 'param_groups': param_groups})
```
• state_dict(): Gets the optimizer's current state information dictionary
```def state_dict(self):
r"""Returns the state of the optimizer as a :class:`dict`.

It contains two entries:

* state - a dict holding current optimization state. Its content
differs between optimizer classes.
* param_groups - a dict containing all parameter groups
"""
# Save order indices instead of Tensors
param_mappings = {}
start_index = 0

def pack_group(group):
......
param_groups = [pack_group(g) for g in self.param_groups]
# Remap state to use order indices as keys
packed_state = {(param_mappings[id(k)] if isinstance(k, torch.Tensor) else k): v
for k, v in self.state.items()}
return {
'state': packed_state,
'param_groups': param_groups,
}
```

# 3. Actual Operation

```# -*- coding: utf-8 -*-
"""
Created on Sat Oct 16 22:46:46 2021

@author: 86493
"""
import torch
import os

# Set weights to follow normal distribution
weight = torch.randn((2, 2),
# Set Gradient to Full 1 Matrix
# Output existing weight s and data
print("The data of weight before step:\n{}".format(weight.data))
print('-' * 60)
print('-' * 60)

# Instantiation optimizer
optimizer = torch.optim.SGD([weight],
lr = 0.1,
momentum = 0.9)
# Further action
optimizer.step()
# View values after one step, gradient
print("The data of weight after step:\n{}".format(weight.data))
print('-' * 60)
print('-' * 60)

# Weight Zeroing
print('-' * 60)

# Output parameters
print("optimizer.parmas_group is \n{}".format(optimizer.param_groups))
print('-' * 60)

# Look at the parameter location, optimizer is the same as weight
# Here you can refer to python as a value-based management
print("weight in optimizer:{}\nweight in weight:{}\n".
format(id(optimizer.param_groups[0]['params'][0]),
id(weight)))
print('-' * 60)

weight2 = torch.randn((3, 3), requires_grad = True)
'lr': 0.0001,
'nesterov': True})
# View existing parameters
print("optimizer.param_groups is \n{}".format(optimizer.param_groups))
print('-' * 60)

# View current status information
opt_state_dict = optimizer.state_dict()
print("state_dict before step:\n", opt_state_dict)
print('-' * 60)

# Perform 50 step operations
for _ in range(50):
optimizer.step()
# Output Existing Status Information
print("state_dict after step:\n", optimizer.state_dict())
print('-' * 60)

# Save parameter information
torch.save(optimizer.state_dict(),
os.path.join(r"D:\Desktop Files\matrix\code\Torch", "optimizer_state_dict.pkl"))
print("--------------------done----------------------")

print('-' * 60)

# Output Last Attribute Information
print("Output final attribute information:\n")
print("Output Properties optimizer.defaults: \n{}".format(optimizer.defaults))
print('-' * 60)
print("Output Properties optimizer.state\n{}".format(optimizer.state))
print('-' * 60)
print("Output Properties optimizer.param_groups\n{}".format(optimizer.param_groups))
```

The results are:

```The data of weight before step:
tensor([[-0.0947,  1.4217],
[-1.3000, -1.0501]])
------------------------------------------------------------
The grad of weight before step:
tensor([[1., 1.],
[1., 1.]])
------------------------------------------------------------
The data of weight after step:
tensor([[-0.1947,  1.3217],
[-1.4000, -1.1501]])
------------------------------------------------------------
The grad of weight after step:
tensor([[1., 1.],
[1., 1.]])
------------------------------------------------------------
tensor([[0., 0.],
[0., 0.]])
------------------------------------------------------------
optimizer.parmas_group is
[{'params': [tensor([[-0.1947,  1.3217],
[-1.4000, -1.1501]],
'lr': 0.1,
'momentum': 0.9,
'dampening': 0,
'weight_decay': 0,
'nesterov': False}]
------------------------------------------------------------
weight in optimizer:1881798878848
weight in weight:1881798878848

------------------------------------------------------------
optimizer.param_groups is
[{'params': [tensor([[-0.1947,  1.3217],
[-1.4000, -1.1501]],
'lr': 0.1,
'momentum': 0.9,
'dampening': 0,
'weight_decay': 0,
'nesterov': False},
{'params': [tensor([[-1.7869,  2.1294, -0.1307],
[ 0.6809, -0.0193, -0.5704],
[-0.5512, -2.5028,  0.2141]], requires_grad=True)], 'lr': 0.0001, 'nesterov': True, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0}]
------------------------------------------------------------
state_dict before step:
{'state': {0: {'momentum_buffer': tensor([[1., 1.],
[1., 1.]])}}, 'param_groups': [{'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [0]}, {'lr': 0.0001, 'nesterov': True, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'params': [1]}]}
------------------------------------------------------------
state_dict after step:
{'state': {0: {'momentum_buffer': tensor([[0.0052, 0.0052],
[0.0052, 0.0052]])}}, 'param_groups': [{'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [0]}, {'lr': 0.0001, 'nesterov': True, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'params': [1]}]}
------------------------------------------------------------
------------done-------------
{'state': {0: {'momentum_buffer': tensor([[0.0052, 0.0052],
[0.0052, 0.0052]])}}, 'param_groups': [{'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [0]}, {'lr': 0.0001, 'nesterov': True, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'params': [1]}]}
------------------------------------------------------------
Output final attribute information:

Output Properties optimizer.defaults:
{'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False}
------------------------------------------------------------
Output Properties optimizer.state
defaultdict(<class 'dict'>, {tensor([[-1.0900,  0.4263],
[-2.2953, -2.0455]], requires_grad=True): {'momentum_buffer': tensor([[0.0052, 0.0052],
[0.0052, 0.0052]])}})
------------------------------------------------------------
Output Properties optimizer.param_groups
[{'lr': 0.1, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [tensor([[-1.0900,  0.4263],
[-2.2953, -2.0455]], requires_grad=True)]}, {'lr': 0.0001, 'nesterov': True, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'params': [tensor([[-1.7869,  2.1294, -0.1307],
[ 0.6809, -0.0193, -0.5704],
```

# Reference

(1) Official pytorch documentation
(2)datawhale notebook
(3)Coding basic concepts:.pkl file what is it?python

Keywords: Pytorch Deep Learning

Added by juhl on Sat, 16 Oct 2021 19:35:25 +0300