Introduction to torchvision
There are four functional modules in torchvision: model, datasets, transforms and utils. Using datasets, you can download some classic datasets. Here we will focus on how to use the ImageFolder of datasets to process custom datasets, and how to use transforms to preprocess and enhance source data.
transforms
transforms provides common operations on PIL Image objects and Tensor objects.
Common operations on PIL Image
Scale/Resize: adjust the size and keep the aspect ratio unchanged.
CenterCrop, RandomCrop, RandomSizedCrop: crop the picture. CenterCrop and RandomCrop have a fixed size in the crop, while RandomResizedCrop is a crop of random size.
Pad: fill.
ToTensor: a PIL with a value range of [0255] [0, 255] [0255] Image is converted to Tensor. Numpy in the shape of (H,W,C)(H, W, C)(H,W,C) Ndarray is converted into torch with the shape of [C,H,W][C, H, W][C,H,W], and the value range is [0,1.0] [0,1.0] [0,1.0] FloatTensor.
RandomHorizontalFlip: the image is randomly flipped horizontally, and the flipping probability is 0.5.
RandomVerticalFlip: the image is randomly flipped vertically.
ColorJitter: modify brightness, contrast, and saturation.
Common operations for Tensor are as follows
Normalize: normalize, that is, subtract the mean value and divide it by the standard deviation.
To pilimage: convert Tensor to PIL Image.
If you want to perform multiple operations on the dataset, you can use Compose to splice these operations like pipes, similar to NN Sequential. The following is an example code:
transforms.Compose([ # The given PIL Image performs central cutting to obtain the given size, # size can be tuple, (target_height, target_width) # size can also be an Integer. In this case, the shape of the cut image is square. transforms.CenterCrop(10), # The position of the cutting center is randomly selected transforms.RandomCrop(20, padding=0), # Put a PIL with a value range of [0, 255] Numpy with image or shape (H, W, C) ndarray, # Convert to torch with shape (C, H, W) and value range [0, 1] FloatTensor transforms.ToTensor(), #Normalized to [- 1, 1] transforms.Normalize(mean = (0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
You can also customize a Python Lambda expression, such as adding 10 to each pixel value, which can be expressed as: transforms Lambda(lambda x: x.add(10)).
For more information, please refer to link
ImageFolder
When the document is under different documents according to the label, such as:
─── data
├── zhangliu
│ ├── 001.jpg
│ └── 002.jpg
├── wuhua
│ ├── 001.jpg
│ └── 002.jpg
...
You can use torchvision datasets. Imagefolder to directly construct a dataset. The code is as follows:
loader = datasets.ImageFolder(path) loader = data.DataLoader(dataset)
ImageFolder will automatically convert the folder name in the directory into a sequence. When the DataLoader loads, the label will automatically be an integer sequence.
Visualization tools
Tensorboard is a visual tool of Google TensorFlow. It can record training data, evaluation data, network structure, image and so on, and can be displayed on the web. It is very helpful to observe the process of neural network training. Python can use tensorboard_logger, visdom and other visualization tools, but these methods are complex or not friendly enough. In order to solve this problem, people have introduced a new and more powerful tool for pytorch visualization - tensorboardX.
Introduction to tensorboardX
tensorboardX is very powerful and supports scalar, image, figure, histogram, audio, text, graph and onnx_graph,embedding,pr_ Visualization methods such as curve and videosummaries.
The general steps for using tensorboardX are as follows.
1) Import tensorboardX, instantiate the SummaryWriter class, and indicate the log path and other information.
from tensorboardX import SummaryWriter # Instantiate SummaryWriter and indicate the log storage path. If there are no logs in the current directory, the directory will be created automatically. writer = SummaryWriter(log_dir='logs') # Call instance. This is just a general reference writer.add_xxx() # Close writer writer.close()
explain:
~~① For Windows environment, log_dir pay attention to path resolution, such as:
writer = SummaryWriter(log_dir=r'D:\myboard\test\logs')
~~
② The SummaryWriter format is:
SummaryWriter(log_dir=None, comment=' ', **kwargs) # Where comment is added to the file name with the suffix comment
③ If you don't write log_dir, the system will create a directory of runs in the directory.
2) Call the corresponding API interface. The general format of the interface is:
add_xxx(tag-name, object, iteration-number) # Add_ XXX (tag, recorded object, number of iterations)
3) Start the tensorboard service:
cd to the same level directory as the logs directory. Enter the following command on the command line. The right side of the logdir equation can be a relative path or an absolute path.
# This is the operation performed by the command line when entering the root directory of logs tensorboard --logdir=logs --port 6006 # This is the operation performed by the command line when you do not enter the root directory of logs # tensorboard --logdir=D:\myboard\test\logs --port 6006
4) web presentation
Enter in the browser
http://Server IP or name: 6006 # if it is local, the server name can use localhost
You can see various graphics saved in the logs directory.
For more information about tensorboardX, please refer to the official website: link
Visualizing neural network with tensorboardX
1) Import required modules
import torch import torch.nn as nn import torch.nn.functional as F import torchvision from tensorboardX import SummaryWriter
2) Constructing neural network
class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 10, kernel_size=5) self.conv2 = nn.Conv2d(10, 20, kernel_size=5) self.conv2_drop = nn.Dropout2d() self.fc1 = nn.Linear(320, 50) self.fc2 = nn.Linear(50, 10) self.bn = nn.BatchNorm2d(20) def forward(self, x): x = F.max_pool2d(self.conv1(x), 2) x = F.relu(x) + F.relu(-x) x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) x = self.bn(x) x = x.view(-1, 320) x = F.relu(self.fc1(x)) x = F.dropout(x, training=self.training) x = self.fc2(x) x = F.softmax(x, dim=1) return x
3) Save the model as a graph
input = torch.rand(32, 1, 28, 28) # Instantiated neural network model = Net() # Save model as graph with SummaryWriter(log_dir='logs', comment='Net') as w: w.add_graph(model, (input,))
After that, you can open the browser to view the details of the network. If the above opening explanation is not clear enough, you can also refer to this article to learn how to open it link
Visualization of loss value with tensorboardX
To visualize the loss value, you need to use add_scalar function, where a layer of fully connected neural network is used to train the parameters of univariate quadratic function.
import torch import numpy as np import torch.nn as nn import torch.optim as optim from tensorboardX import SummaryWriter dtype = torch.FloatTensor writer = SummaryWriter(log_dir='logs', comment='Linear') np.random.seed(100) x_train = np.linspace(-1, 1, 100).reshape(100, 1) y_train = 3 * np.power(x_train, 2) + 2 + 0.2 * np.random.rand(x_train.size).reshape(100, 1) input_size = 1 output_size = 1 learning_rate = 0.01 num_epoches = 60 model = nn.Linear(input_size, output_size) criterion = nn.MSELoss() optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) for epoch in range(num_epoches): inputs = torch.from_numpy(x_train).type(dtype) targets = torch.from_numpy(y_train).type(dtype) output = model(inputs) loss = criterion(output, targets) optimizer.zero_grad() loss.backward() optimizer.step() # Save loss data and epoch data writer.add_scalar('Training loss value', loss, epoch)
reference
Wu Maogui, Yu Mingmin, Yang benfa, Li Tao, Zhang Yuelei Python deep learning (based on pytoch) Beijing: China Machine Press, 2019