This paper is a collection of common code segments of PyTorch, covering five aspects: basic configuration, tensor processing, model definition and operation, data processing, model training and testing, and also gives several noteworthy Tips, which are very comprehensive.
PyTorch's best information is official documents. This article is a common code snippet of PyTorch. Some patches have been made on the basis of reference [1] (Zhang Hao: PyTorch Cookbook) for easy reference.
1. Basic configuration
Import package and version query
import torch import torch.nn as nn import torchvision print(torch.__version__) print(torch.version.cuda) print(torch.backends.cudnn.version()) print(torch.cuda.get_device_name(0))
Reproducibility
When the hardware devices (CPU, GPU) are different, the complete reproducibility cannot be guaranteed, even if the random seeds are the same. However, on the same device, reproducibility should be guaranteed. The specific method is to fix the random seed of torch at the beginning of the program, and also fix the random seed of numpy.
np.random.seed(0) torch.manual_seed(0) torch.cuda.manual_seed_all(0) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False
Graphics card settings
If you only need one graphics card
# Device configuration device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
If you need to specify multiple graphics cards, such as 0 and 1 graphics cards.
import os os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'
You can also set the graphics card when running code from the command line:
CUDA_VISIBLE_DEVICES=0,1 python train.py
Clear video memory
torch.cuda.empty_cache()
You can also use the command to reset the GPU on the command line
nvidia-smi --gpu-reset -i [gpu_id]
2. Tensor processing
Data type of tensor
PyTorch has 9 CPU tensor types and 9 GPU tensor types.
Tensor basic information
tensor = torch.randn(3,4,5) print(tensor.type()) # data type print(tensor.size()) # The shape of tensor is a tuple print(tensor.dim()) # Number of dimensions
Named tensor
Tensor naming is a very useful method, which can easily use the name of the dimension for indexing or other operations, greatly improving readability and ease of use and preventing errors.
# Before PyTorch 1.3, you need to use comments # Tensor[N, C, H, W] images = torch.randn(32, 3, 56, 56) images.sum(dim=1) images.select(dim=1, index=0) # After PyTorch 1.3 NCHW = ['N', 'C', 'H', 'W'] images = torch.randn(32, 3, 56, 56, names=NCHW) images.sum('C') images.select('C', index=0) # You can also set it this way tensor = torch.rand(3,4,1,2,names=('C', 'N', 'H', 'W')) # Use align_to can easily sort dimensions tensor = tensor.align_to('N', 'C', 'H', 'W')
Data type conversion
# By setting the default type, FloatTensor in pytorch is much faster than DoubleTensor torch.set_default_tensor_type(torch.FloatTensor) # Type conversion tensor = tensor.cuda() tensor = tensor.cpu() tensor = tensor.float() tensor = tensor.long()
torch.Tensor and NP Ndarray conversion
In addition to CharTensor, tensors on all other CPU s support conversion to numpy format and then back.
ndarray = tensor.cpu().numpy() tensor = torch.from_numpy(ndarray).float() tensor = torch.from_numpy(ndarray.copy()).float() # If ndarray has negative stride.
Torch.tensor and PIL Image conversion
# The tensor in pytorch adopts the order of [N, C, H, W] by default, and the data range is [0,1], which needs to be transposed and normalized # torch.Tensor -> PIL.Image image = PIL.Image.fromarray(torch.clamp(tensor*255, min=0, max=255).byte().permute(1,2,0).cpu().numpy()) image = torchvision.transforms.functional.to_pil_image(tensor) # Equivalently way # PIL.Image -> torch.Tensor path = r'./figure.jpg' tensor = torch.from_numpy(np.asarray(PIL.Image.open(path))).permute(2,0,1).float() / 255 tensor = torchvision.transforms.functional.to_tensor(PIL.Image.open(path)) # Equivalently way
np.ndarray and PIL Image conversion
image = PIL.Image.fromarray(ndarray.astype(np.uint8)) ndarray = np.asarray(PIL.Image.open(path))
Extract values from tensors that contain only one element
value = torch.rand(1).item()
Tensor deformation
# When the convolution layer is input into the fully connected layer, it is usually necessary to deform the tensor, # Compared with torch view,torch.reshape can automatically handle the discontinuous input tensor. tensor = torch.rand(2,3,4) shape = (6, 4) tensor = torch.reshape(tensor, shape)
Disorder order
tensor = tensor[torch.randperm(tensor.size(0))] # Disrupt the first dimension
Flip horizontally
# pytorch does not support negative step operation such as tensor[::-1], and horizontal flip can be realized through tensor index # Suppose the dimension of the tensor is [N, D, H, W] tensor = tensor[:,:,:,torch.arange(tensor.size(3) - 1, -1, -1).long()]
Replication tensor
# Operation | New/Shared memory | Still in computation graph | tensor.clone() # | New | Yes | tensor.detach() # | Shared | No | tensor.detach.clone()() # | New | No |
Tensor splicing
''' be careful torch.cat and torch.stack The difference is torch.cat Splice along a given dimension, and torch.stack One dimension will be added. For example, when the parameter is 3 10 x5 Tensor of, torch.cat The result is 30 x5 Tensor of, and torch.stack The result is 3 x10x5 Tensor of. ''' tensor = torch.cat(list_of_tensors, dim=0) tensor = torch.stack(list_of_tensors, dim=0)
Convert integer labels to one hot encoding
# pytorch tags start at 0 by default tensor = torch.tensor([0, 2, 1, 3]) N = tensor.size(0) num_classes = 4 one_hot = torch.zeros(N, num_classes).long() one_hot.scatter_(dim=1, index=torch.unsqueeze(tensor, dim=1), src=torch.ones(N, num_classes).long())
Get non-zero elements
torch.nonzero(tensor) # index of non-zero elements torch.nonzero(tensor==0) # index of zero elements torch.nonzero(tensor).size(0) # number of non-zero elements torch.nonzero(tensor == 0).size(0) # number of zero elements
Judge that the two tensors are equal
torch.allclose(tensor1, tensor2) # float tensor torch.equal(tensor1, tensor2) # int tensor
Tensor extension
# Expand tensor of shape 64*512 to shape 64*512*7*7. tensor = torch.rand(64,512) torch.reshape(tensor, (64, 512, 1, 1)).expand(64, 512, 7, 7)
Matrix multiplication
# Matrix multiplcation: (m*n) * (n*p) * -> (m*p). result = torch.mm(tensor1, tensor2) # Batch matrix multiplication: (b*m*n) * (b*n*p) -> (b*m*p) result = torch.bmm(tensor1, tensor2) # Element-wise multiplication. result = tensor1 * tensor2
Calculate the Euclidean distance between two sets of data
Using broadcast mechanism
dist = torch.sqrt(torch.sum((X1[:,None,:] - X2) ** 2, dim=2))
3. Model definition and operation
An example of a simple two-layer convolution network
# convolutional neural network (2 convolutional layers) class ConvNet(nn.Module): def __init__(self, num_classes=10): super(ConvNet, self).__init__() self.layer1 = nn.Sequential( nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2), nn.BatchNorm2d(16), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2)) self.layer2 = nn.Sequential( nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2), nn.BatchNorm2d(32), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2)) self.fc = nn.Linear(7*7*32, num_classes) def forward(self, x): out = self.layer1(x) out = self.layer2(out) out = out.reshape(out.size(0), -1) out = self.fc(out) return out model = ConvNet(num_classes).to(device)
The calculation and display of convolution layer can be assisted by this website.
bilinear pooling
X = torch.reshape(N, D, H * W) # Assume X has shape N*D*H*W X = torch.bmm(X, torch.transpose(X, 1, 2)) / (H * W) # Bilinear pooling assert X.size() == (N, D, D) X = torch.reshape(X, (N, D * D)) X = torch.sign(X) * torch.sqrt(torch.abs(X) + 1e-5) # Signed-sqrt normalization X = torch.nn.functional.normalize(X) # L2 normalization
Multi card synchronization BN (Batch normalization)
When using torch nn. When dataparallel runs the code on multiple GPU cards, the default operation of PyTorch's BN layer is to calculate the mean and standard deviation of the data on each card independently. Synchronous BN uses the data on all cards to calculate the mean and standard deviation of BN layer together, which alleviates the inaccurate estimation of the mean and standard deviation when the batch size is small, It is an effective skill to improve performance in tasks such as target detection.
sync_bn = torch.nn.SyncBatchNorm(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
Change all BN layers of the existing network to synchronous BN layer
def convertBNtoSyncBN(module, process_group=None): '''Recursively replace all BN layers to SyncBN layer. Args: module[torch.nn.Module]. Network ''' if isinstance(module, torch.nn.modules.batchnorm._BatchNorm): sync_bn = torch.nn.SyncBatchNorm(module.num_features, module.eps, module.momentum, module.affine, module.track_running_stats, process_group) sync_bn.running_mean = module.running_mean sync_bn.running_var = module.running_var if module.affine: sync_bn.weight = module.weight.clone().detach() sync_bn.bias = module.bias.clone().detach() return sync_bn else: for name, child_module in module.named_children(): setattr(module, name) = convert_syncbn_model(child_module, process_group=process_group)) return module
Similar to BN moving average
If you want to achieve an operation similar to BN moving average, you should use the in place operation to assign a value to the moving average in the forward function.
class BN(torch.nn.Module) def __init__(self): ... self.register_buffer('running_mean', torch.zeros(num_features)) def forward(self, X): ... self.running_mean += momentum * (current - self.running_mean)
Calculate the overall parameters of the model
num_parameters = sum(torch.numel(parameter) for parameter in model.parameters())
View parameters in the network
You can use model state_ Dict () or model named_ Parameters() function to view all the trainable parameters (including the parameters in the parent class obtained through inheritance)
params = list(model.named_parameters()) (name, param) = params[28] print(name) print(param.grad) print('-------------------------------------------------') (name2, param2) = params[29] print(name2) print(param2.grad) print('----------------------------------------------------') (name1, param1) = params[30] print(name1) print(param1.grad)
Model visualization (using pytorchviz)
szagoruyko/pytorchvizgithub.com
A model similar to Keras Summary () outputs model information, using pytorch summary
sksq96/pytorch-summarygithub.com
Model weight initialization
Pay attention to model Modules () and model The difference between children (): model Modules () iterates through all sublayers of the model, while model Children () will only traverse one layer under the model.
# Common practise for initialization. for layer in model.modules(): if isinstance(layer, torch.nn.Conv2d): torch.nn.init.kaiming_normal_(layer.weight, mode='fan_out', nonlinearity='relu') if layer.bias is not None: torch.nn.init.constant_(layer.bias, val=0.0) elif isinstance(layer, torch.nn.BatchNorm2d): torch.nn.init.constant_(layer.weight, val=1.0) torch.nn.init.constant_(layer.bias, val=0.0) elif isinstance(layer, torch.nn.Linear): torch.nn.init.xavier_normal_(layer.weight) if layer.bias is not None: torch.nn.init.constant_(layer.bias, val=0.0) # Initialization with given tensor. layer.weight = torch.nn.Parameter(tensor)
Extract a layer in the model
modules() will return the iterators of all modules in the model, and it can access the innermost layer, such as self layer1. The conv1 module also has a name corresponding to them_ Children() attribute and named_modules(), which will return not only the iterator of the module, but also the name of the network layer.
# Take the first two layers in the model new_model = nn.Sequential(*list(model.children())[:2] # If you want to extract all the convolution layers in the model, you can do the following: for layer in model.named_modules(): if isinstance(layer[1],nn.Conv2d): conv_model.add_module(layer[0],layer[1])
Some layers use the pre training model
Note that if the saved model is torch nn. Dataparallel, the current model also needs to be
model.load_state_dict(torch.load('model.pth'), strict=False)
Load the model saved in GPU into CPU
model.load_state_dict(torch.load('model.pth', map_location='cpu'))
Import the same part of another model into the new model
When importing parameters from a model, if the structures of the two models are inconsistent, an error will be reported when importing parameters directly. The following method can be used to import the same part of another model into the new model.
# model_new stands for the new model # model_saved represents other models, such as torch Load imported saved model model_new_dict = model_new.state_dict() model_common_dict = {k:v for k, v in model_saved.items() if k in model_new_dict.keys()} model_new_dict.update(model_common_dict) model_new.load_state_dict(model_new_dict)
4. Data processing
Calculate the mean and standard deviation of the data set
import os import cv2 import numpy as np from torch.utils.data import Dataset from PIL import Image def compute_mean_and_std(dataset): # Input the dataset of PyTorch and output the mean and standard deviation mean_r = 0 mean_g = 0 mean_b = 0 for img, _ in dataset: img = np.asarray(img) # change PIL Image to numpy array mean_b += np.mean(img[:, :, 0]) mean_g += np.mean(img[:, :, 1]) mean_r += np.mean(img[:, :, 2]) mean_b /= len(dataset) mean_g /= len(dataset) mean_r /= len(dataset) diff_r = 0 diff_g = 0 diff_b = 0 N = 0 for img, _ in dataset: img = np.asarray(img) diff_b += np.sum(np.power(img[:, :, 0] - mean_b, 2)) diff_g += np.sum(np.power(img[:, :, 1] - mean_g, 2)) diff_r += np.sum(np.power(img[:, :, 2] - mean_r, 2)) N += np.prod(img[:, :, 0].shape) std_b = np.sqrt(diff_b / N) std_g = np.sqrt(diff_g / N) std_r = np.sqrt(diff_r / N) mean = (mean_b.item() / 255.0, mean_g.item() / 255.0, mean_r.item() / 255.0) std = (std_b.item() / 255.0, std_g.item() / 255.0, std_r.item() / 255.0) return mean, std
Get basic information of video data
import cv2 video = cv2.VideoCapture(mp4_path) height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT)) width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH)) num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT)) fps = int(video.get(cv2.CAP_PROP_FPS)) video.release()
TSN samples one frame of video per segment
K = self._num_segments if is_train: if num_frames > K: # Random index for each segment. frame_indices = torch.randint( high=num_frames // K, size=(K,), dtype=torch.long) frame_indices += num_frames // K * torch.arange(K) else: frame_indices = torch.randint( high=num_frames, size=(K - num_frames,), dtype=torch.long) frame_indices = torch.sort(torch.cat(( torch.arange(num_frames), frame_indices)))[0] else: if num_frames > K: # Middle index for each segment. frame_indices = num_frames / K // 2 frame_indices += num_frames // K * torch.arange(K) else: frame_indices = torch.sort(torch.cat(( torch.arange(num_frames), torch.arange(K - num_frames))))[0] assert frame_indices.size() == (K,) return [frame_indices[i] for i in range(K)]
Common training and verification data preprocessing
The ToTensor operation will PIL Image or shape is h × W × D. NP with value range of [0, 255] Convert ndarray to shape D × H × W. Torch with value range of [0.0, 1.0] Tensor.
train_transform = torchvision.transforms.Compose([ torchvision.transforms.RandomResizedCrop(size=224, scale=(0.08, 1.0)), torchvision.transforms.RandomHorizontalFlip(), torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)), ]) val_transform = torchvision.transforms.Compose([ torchvision.transforms.Resize(256), torchvision.transforms.CenterCrop(224), torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)), ])
5. Model training and testing
Classification model training code
# Loss and optimizer criterion = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) # Train the model total_step = len(train_loader) for epoch in range(num_epochs): for i ,(images, labels) in enumerate(train_loader): images = images.to(device) labels = labels.to(device) # Forward pass outputs = model(images) loss = criterion(outputs, labels) # Backward and optimizer optimizer.zero_grad() loss.backward() optimizer.step() if (i+1) % 100 == 0: print('Epoch: [{}/{}], Step: [{}/{}], Loss: {}' .format(epoch+1, num_epochs, i+1, total_step, loss.item()))
Classification model test code
# Test the model model.eval() # eval mode(batch norm uses moving mean/variance #instead of mini-batch mean/variance) with torch.no_grad(): correct = 0 total = 0 for images, labels in test_loader: images = images.to(device) labels = labels.to(device) outputs = model(images) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() print('Test accuracy of the model on the 10000 test images: {} %' .format(100 * correct / total))
Custom loss
Inherit torch nn. Module class writes its own loss.
class MyLoss(torch.nn.Moudle): def __init__(self): super(MyLoss, self).__init__() def forward(self, x, y): loss = torch.mean((x - y) ** 2) return loss
label smoothing
Write a label_smoothing.py file, and then reference it in the training code, and use LSR instead of cross entropy loss. label_smoothing.py contents are as follows:
import torch import torch.nn as nn class LSR(nn.Module): def __init__(self, e=0.1, reduction='mean'): super().__init__() self.log_softmax = nn.LogSoftmax(dim=1) self.e = e self.reduction = reduction def _one_hot(self, labels, classes, value=1): """ Convert labels to one hot vectors Args: labels: torch tensor in format [label1, label2, label3, ...] classes: int, number of classes value: label value in one hot vector, default to 1 Returns: return one hot format labels in shape [batchsize, classes] """ one_hot = torch.zeros(labels.size(0), classes) #labels and value_added size must match labels = labels.view(labels.size(0), -1) value_added = torch.Tensor(labels.size(0), 1).fill_(value) value_added = value_added.to(labels.device) one_hot = one_hot.to(labels.device) one_hot.scatter_add_(1, labels, value_added) return one_hot def _smooth_label(self, target, length, smooth_factor): """convert targets to one-hot format, and smooth them. Args: target: target in form with [label1, label2, label_batchsize] length: length of one-hot format(number of classes) smooth_factor: smooth factor for label smooth Returns: smoothed labels in one hot format """ one_hot = self._one_hot(target, length, value=1 - smooth_factor) one_hot += smooth_factor / (length - 1) return one_hot.to(target.device) def forward(self, x, target): if x.size(0) != target.size(0): raise ValueError('Expected input batchsize ({}) to match target batch_size({})' .format(x.size(0), target.size(0))) if x.dim() < 2: raise ValueError('Expected input tensor to have least 2 dimensions(got {})' .format(x.size(0))) if x.dim() != 2: raise ValueError('Only 2 dimension tensor are implemented, (got {})' .format(x.size())) smoothed_target = self._smooth_label(target, x.size(1), self.e) x = self.log_softmax(x) loss = torch.sum(- x * smoothed_target, dim=1) if self.reduction == 'none': return loss elif self.reduction == 'sum': return torch.sum(loss) elif self.reduction == 'mean': return torch.mean(loss) else: raise ValueError('unrecognized option, expect reduction to be one of none, mean, sum')
Or do label smoothing directly in the training file
for images, labels in train_loader: images, labels = images.cuda(), labels.cuda() N = labels.size(0) # C is the number of classes. smoothed_labels = torch.full(size=(N, C), fill_value=0.1 / (C - 1)).cuda() smoothed_labels.scatter_(dim=1, index=torch.unsqueeze(labels, dim=1), value=0.9) score = model(images) log_prob = torch.nn.functional.log_softmax(score, dim=1) loss = -torch.sum(log_prob * smoothed_labels) / N optimizer.zero_grad() loss.backward() optimizer.step()
Mixup training
beta_distribution = torch.distributions.beta.Beta(alpha, alpha) for images, labels in train_loader: images, labels = images.cuda(), labels.cuda() # Mixup images and labels. lambda_ = beta_distribution.sample([]).item() index = torch.randperm(images.size(0)).cuda() mixed_images = lambda_ * images + (1 - lambda_) * images[index, :] label_a, label_b = labels, labels[index] # Mixup loss. scores = model(mixed_images) loss = (lambda_ * loss_function(scores, label_a) + (1 - lambda_) * loss_function(scores, label_b)) optimizer.zero_grad() loss.backward() optimizer.step()
L1 regularization
l1_regularization = torch.nn.L1Loss(reduction='sum') loss = ... # Standard cross-entropy loss for param in model.parameters(): loss += torch.sum(torch.abs(param)) loss.backward()
Do not weight decay the offset term
weight decay in pytorch is equivalent to l2 regularization
bias_list = (param for name, param in model.named_parameters() if name[-4:] == 'bias') others_list = (param for name, param in model.named_parameters() if name[-4:] != 'bias') parameters = [{'parameters': bias_list, 'weight_decay': 0}, {'parameters': others_list}] optimizer = torch.optim.SGD(parameters, lr=1e-2, momentum=0.9, weight_decay=1e-4)
gradient clipping
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=20)
Get current learning rate
# If there is one global learning rate (which is the common case). lr = next(iter(optimizer.param_groups))['lr'] # If there are multiple learning rates for different layers. all_lr = [] for param_group in optimizer.param_groups: all_lr.append(param_group['lr'])
Another way is that in a batch training code, the current LR is optimizer param_ groups[0]['lr']
Learning rate attenuation
# Reduce learning rate when validation accuarcy plateau. scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', patience=5, verbose=True) for t in range(0, 80): train(...) val(...) scheduler.step(val_acc) # Cosine annealing learning rate. scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=80) # Reduce learning rate by 10 at given epochs. scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[50, 70], gamma=0.1) for t in range(0, 80): scheduler.step() train(...) val(...) # Learning rate warmup by 10 epochs. scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda t: t / 10) for t in range(0, 10): scheduler.step() train(...) val(...)
Optimizer chain update
Starting with version 1.4, torch optim. lr_ Scheduler supports chaining, that is, users can define two schedulers and use them alternately in training.
import torch from torch.optim import SGD from torch.optim.lr_scheduler import ExponentialLR, StepLR model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = SGD(model, 0.1) scheduler1 = ExponentialLR(optimizer, gamma=0.9) scheduler2 = StepLR(optimizer, step_size=3, gamma=0.1) for epoch in range(4): print(epoch, scheduler2.get_last_lr()[0]) optimizer.step() scheduler1.step() scheduler2.step()
Model training visualization
PyTorch can use tensorboard to visualize the training process.
Install and run TensorBoard.
pip install tensorboard tensorboard --logdir=runs
Use the SummaryWriter class to collect and visualize the corresponding data. It is convenient to view it. You can use different folders, such as' Loss/train 'and' Loss/test '.
from torch.utils.tensorboard import SummaryWriter import numpy as np writer = SummaryWriter() for n_iter in range(100): writer.add_scalar('Loss/train', np.random.random(), n_iter) writer.add_scalar('Loss/test', np.random.random(), n_iter) writer.add_scalar('Accuracy/train', np.random.random(), n_iter) writer.add_scalar('Accuracy/test', np.random.random(), n_iter)
Save and load breakpoints
Note that in order to recover the training, we need to save the state of the model and optimizer, as well as the current number of training rounds.
start_epoch = 0 # Load checkpoint. if resume: # resume is the parameter, which is set to 0 during the first training and 1 when the retraining is interrupted model_path = os.path.join('model', 'best_checkpoint.pth.tar') assert os.path.isfile(model_path) checkpoint = torch.load(model_path) best_acc = checkpoint['best_acc'] start_epoch = checkpoint['epoch'] model.load_state_dict(checkpoint['model']) optimizer.load_state_dict(checkpoint['optimizer']) print('Load checkpoint at epoch {}.'.format(start_epoch)) print('Best accuracy so far {}.'.format(best_acc)) # Train the model for epoch in range(start_epoch, num_epochs): ... # Test the model ... # save checkpoint is_best = current_acc > best_acc best_acc = max(current_acc, best_acc) checkpoint = { 'best_acc': best_acc, 'epoch': epoch + 1, 'model': model.state_dict(), 'optimizer': optimizer.state_dict(), } model_path = os.path.join('model', 'checkpoint.pth.tar') best_model_path = os.path.join('model', 'best_checkpoint.pth.tar') torch.save(checkpoint, model_path) if is_best: shutil.copy(model_path, best_model_path)
Extract the convolution feature of a layer of ImageNet pre training model
# VGG-16 relu5-3 feature. model = torchvision.models.vgg16(pretrained=True).features[:-1] # VGG-16 pool5 feature. model = torchvision.models.vgg16(pretrained=True).features # VGG-16 fc7 feature. model = torchvision.models.vgg16(pretrained=True) model.classifier = torch.nn.Sequential(*list(model.classifier.children())[:-3]) # ResNet GAP feature. model = torchvision.models.resnet18(pretrained=True) model = torch.nn.Sequential(collections.OrderedDict( list(model.named_children())[:-1])) with torch.no_grad(): model.eval() conv_representation = model(image)
Extract the convolution features of ImageNet pre training model
class FeatureExtractor(torch.nn.Module): """Helper class to extract several convolution features from the given pre-trained model. Attributes: _model, torch.nn.Module. _layers_to_extract, list<str> or set<str> Example: >>> model = torchvision.models.resnet152(pretrained=True) >>> model = torch.nn.Sequential(collections.OrderedDict( list(model.named_children())[:-1])) >>> conv_representation = FeatureExtractor( pretrained_model=model, layers_to_extract={'layer1', 'layer2', 'layer3', 'layer4'})(image) """ def __init__(self, pretrained_model, layers_to_extract): torch.nn.Module.__init__(self) self._model = pretrained_model self._model.eval() self._layers_to_extract = set(layers_to_extract) def forward(self, x): with torch.no_grad(): conv_representation = [] for name, layer in self._model.named_children(): x = layer(x) if name in self._layers_to_extract: conv_representation.append(x) return conv_representation
Fine tuning full connection layer
model = torchvision.models.resnet18(pretrained=True) for param in model.parameters(): param.requires_grad = False model.fc = nn.Linear(512, 100) # Replace the last fc layer optimizer = torch.optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9, weight_decay=1e-4)
Fine tune the full connection layer with a larger learning rate and the convolution layer with a smaller learning rate
model = torchvision.models.resnet18(pretrained=True) finetuned_parameters = list(map(id, model.fc.parameters())) conv_parameters = (p for p in model.parameters() if id(p) not in finetuned_parameters) parameters = [{'params': conv_parameters, 'lr': 1e-3}, {'params': model.fc.parameters()}] optimizer = torch.optim.SGD(parameters, lr=1e-2, momentum=0.9, weight_decay=1e-4)
6. Other precautions
Do not use too large linear layers. Because NN Linear (m, n) uses memory. If the linear layer is too large, it is easy to exceed the existing video memory.
Do not use RNN on too long sequences. Because RNN back propagation uses BPTT algorithm, the memory required is linear with the length of the input sequence.
Use model before model(x) Train () and model Eval() switches the network state.
For code blocks that do not need to calculate the gradient, use with torch no_ Grad() is included.
model.eval() and torch no_ The difference between grad () and model Eval () is to switch the network to the test state. For example, BN and dropout use different calculation methods in the training and test stages. torch.no_grad() turns off the automatic derivation mechanism of PyTorch tensor to reduce storage usage and speed up calculation. The results obtained cannot be lost backward().
model.zero_grad() will reset the gradient of the parameters of the whole model to zero, while optimizer zero_ Grad () will only zero the gradient of the parameter passed in torch. nn. The input of crossentropyloss does not need to go through Softmax. torch.nn.CrossEntropyLoss is equivalent to torch nn. functional. log_ Softmax + torch. nn. NLLLoss. loss. Use optimizer before backward() zero_ Grad() clears the cumulative gradient.
torch. utils. data. Try to set pin in dataloader_ Memory = true, set pin for very small data sets such as MNIST_ Memory = false is faster. num_ The setting of workers needs to find the fastest value in the experiment. Delete unused intermediate variables in time with del to save GPU storage.
Using inplace operation can save GPU storage, such as
x = torch.nn.functional.relu(x, inplace=True)
Reduce data transmission between CPU and GPU. For example, if you want to know the loss and accuracy of each mini batch in an epoch, first accumulate them in the GPU, and then transmit them back to the CPU together after the end of an epoch, which is faster than the GPU to CPU transmission of each mini batch.
The use of semi precision floating-point number half() will improve the speed to a certain extent, and the specific efficiency depends on the GPU model. We need to be careful about the stability problem caused by low numerical accuracy. Often use assert tensor Size () = = (n, D, h, w) as a debugging means to ensure that the tensor dimension is consistent with your assumption. In addition to marking y, the one-dimensional tensor should be used as little as possible, and the two-dimensional tensor of n*1 should be used instead, which can avoid some unexpected one-dimensional tensor calculation results.
Time consuming for each part of the statistical code
with torch.autograd.profiler.profile(enabled=True, use_cuda=False) as profile: ...print(profile)# Or run Python - M torch. On the command line utils. bottleneck main. py
Use TorchSnooper to debug PyTorch code. When the program is executed, it will automatically print out the shape, data type, equipment and gradient information of the tensor of the execution result of each line.
# pip install torchsnooperimport torchsnooper# For functions, use modifiers@torchsnooper.snoop()# Load the with statement into the training function, if not the with statement. with torchsnooper.snoop(): original code
reference material
- Zhang Hao: PyTorch Cookbook (collection of common code snippets), https://zhuanlan.zhihu.com/p/59205847?
- PyTorch official documents and examples
- https://pytorch.org/docs/stable/notes/faq.html
- https://github.com/szagoruyko/pytorchviz
- https://github.com/sksq96/pytorch-summary
- other
end