Deep learning notes -- pytorch data processing toolbox

Introduction to torchvision

There are four functional modules in torchvision: model, datasets, transforms and utils. Using datasets, you can download some classic datasets. Here we will focus on how to use the ImageFolder of datasets to process custom datasets, and how to use transforms to preprocess and enhance source data.


transforms provides common operations on PIL Image objects and Tensor objects.

Common operations on PIL Image

Scale/Resize: adjust the size and keep the aspect ratio unchanged.
CenterCrop, RandomCrop, RandomSizedCrop: crop the picture. CenterCrop and RandomCrop have a fixed size in the crop, while RandomResizedCrop is a crop of random size.
Pad: fill.
ToTensor: a PIL with a value range of [0255] [0, 255] [0255] Image is converted to Tensor. Numpy in the shape of (H,W,C)(H, W, C)(H,W,C) Ndarray is converted into torch with the shape of [C,H,W][C, H, W][C,H,W], and the value range is [0,1.0] [0,1.0] [0,1.0] FloatTensor.
RandomHorizontalFlip: the image is randomly flipped horizontally, and the flipping probability is 0.5.
RandomVerticalFlip: the image is randomly flipped vertically.
ColorJitter: modify brightness, contrast, and saturation.

Common operations for Tensor are as follows

Normalize: normalize, that is, subtract the mean value and divide it by the standard deviation.
To pilimage: convert Tensor to PIL Image.
If you want to perform multiple operations on the dataset, you can use Compose to splice these operations like pipes, similar to NN Sequential. The following is an example code:

		# The given PIL Image performs central cutting to obtain the given size,
		# size can be tuple, (target_height, target_width)
		# size can also be an Integer. In this case, the shape of the cut image is square.
		# The position of the cutting center is randomly selected
		transforms.RandomCrop(20, padding=0),
		# Put a PIL with a value range of [0, 255] Numpy with image or shape (H, W, C) ndarray,
		# Convert to torch with shape (C, H, W) and value range [0, 1] FloatTensor
		#Normalized to [- 1, 1]
		transforms.Normalize(mean = (0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))

You can also customize a Python Lambda expression, such as adding 10 to each pixel value, which can be expressed as: transforms Lambda(lambda x: x.add(10)).
For more information, please refer to link


When the document is under different documents according to the label, such as:
─── data
├── zhangliu
│ ├── 001.jpg
│ └── 002.jpg
├── wuhua
│ ├── 001.jpg
│ └── 002.jpg
You can use torchvision datasets. Imagefolder to directly construct a dataset. The code is as follows:

loader = datasets.ImageFolder(path)
loader = data.DataLoader(dataset)

ImageFolder will automatically convert the folder name in the directory into a sequence. When the DataLoader loads, the label will automatically be an integer sequence.

Visualization tools

Tensorboard is a visual tool of Google TensorFlow. It can record training data, evaluation data, network structure, image and so on, and can be displayed on the web. It is very helpful to observe the process of neural network training. Python can use tensorboard_logger, visdom and other visualization tools, but these methods are complex or not friendly enough. In order to solve this problem, people have introduced a new and more powerful tool for pytorch visualization - tensorboardX.

Introduction to tensorboardX

tensorboardX is very powerful and supports scalar, image, figure, histogram, audio, text, graph and onnx_graph,embedding,pr_ Visualization methods such as curve and videosummaries.
The general steps for using tensorboardX are as follows.
1) Import tensorboardX, instantiate the SummaryWriter class, and indicate the log path and other information.

from tensorboardX import SummaryWriter
# Instantiate SummaryWriter and indicate the log storage path. If there are no logs in the current directory, the directory will be created automatically.
writer = SummaryWriter(log_dir='logs')
# Call instance. This is just a general reference
# Close writer

~~① For Windows environment, log_dir pay attention to path resolution, such as:

writer = SummaryWriter(log_dir=r'D:\myboard\test\logs')

② The SummaryWriter format is:

SummaryWriter(log_dir=None, comment=' ', **kwargs)
# Where comment is added to the file name with the suffix comment

③ If you don't write log_dir, the system will create a directory of runs in the directory.
2) Call the corresponding API interface. The general format of the interface is:

add_xxx(tag-name, object, iteration-number)
# Add_ XXX (tag, recorded object, number of iterations)

3) Start the tensorboard service:
cd to the same level directory as the logs directory. Enter the following command on the command line. The right side of the logdir equation can be a relative path or an absolute path.

# This is the operation performed by the command line when entering the root directory of logs
tensorboard --logdir=logs --port 6006
# This is the operation performed by the command line when you do not enter the root directory of logs
# tensorboard --logdir=D:\myboard\test\logs --port 6006

4) web presentation
Enter in the browser

http://Server IP or name: 6006 # if it is local, the server name can use localhost

You can see various graphics saved in the logs directory.
For more information about tensorboardX, please refer to the official website: link

Visualizing neural network with tensorboardX

1) Import required modules

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from tensorboardX import SummaryWriter

2) Constructing neural network

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10) = nn.BatchNorm2d(20)

    def forward(self, x):
        x = F.max_pool2d(self.conv1(x), 2)
        x = F.relu(x) + F.relu(-x)
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x =
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x,
        x = self.fc2(x)
        x = F.softmax(x, dim=1)
        return x

3) Save the model as a graph

input = torch.rand(32, 1, 28, 28)
# Instantiated neural network
model = Net()
# Save model as graph
with SummaryWriter(log_dir='logs', comment='Net') as w:
    w.add_graph(model, (input,))

After that, you can open the browser to view the details of the network. If the above opening explanation is not clear enough, you can also refer to this article to learn how to open it link

Visualization of loss value with tensorboardX

To visualize the loss value, you need to use add_scalar function, where a layer of fully connected neural network is used to train the parameters of univariate quadratic function.

import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim
from tensorboardX import SummaryWriter

dtype = torch.FloatTensor
writer = SummaryWriter(log_dir='logs', comment='Linear')
x_train = np.linspace(-1, 1, 100).reshape(100, 1)
y_train = 3 * np.power(x_train, 2) + 2 + 0.2 * np.random.rand(x_train.size).reshape(100, 1)

input_size = 1
output_size = 1
learning_rate = 0.01
num_epoches = 60
model = nn.Linear(input_size, output_size)
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
for epoch in range(num_epoches):
    inputs = torch.from_numpy(x_train).type(dtype)
    targets = torch.from_numpy(y_train).type(dtype)

    output = model(inputs)
    loss = criterion(output, targets)

    # Save loss data and epoch data
    writer.add_scalar('Training loss value', loss, epoch)


Wu Maogui, Yu Mingmin, Yang benfa, Li Tao, Zhang Yuelei Python deep learning (based on pytoch) Beijing: China Machine Press, 2019

Keywords: Python Pytorch Deep Learning

Added by blue-genie on Mon, 07 Mar 2022 13:34:28 +0200