pytorch learning - image expansion (life two, two born three, three born all things?)

1, Image augmentation

Definition & Interpretation:

  1. By making a series of random changes to the training image, similar but different training samples are generated, so as to expand the scale of the training data set.
  2. Randomly changing the training samples can reduce the dependence of the model on some attributes, so as to improve the generalization ability of the model

2, Common image augmentation methods

Use the following 400x500 image as an example

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import torch 
import torchvision
from d2l import torch as d2l
from torch import nn 
from PIL import Image
img ='./data/cat_dog/cat1.jpg')
plt.title('Initial data')

Most image augmentation methods have certain randomness. In order to observe the effect of image enlargement, we define the auxiliary function apply below. This function runs the image augmentation method aug multiple times on the input image img and displays all the results.

def apply(img,aug,num_rows=2,num_cols=4,scale=1.5):
    Y = [aug(img) for _ in range(num_rows*num_cols)]

1. Turning and cutting

  1. Flipping images left and right usually does not change the category of objects. This is one of the earliest and most widely used image augmentation methods.
  2. Flipping images up and down is not as common as flipping left and right images. However, at least for this example image, flipping up and down does not hinder recognition.
  3. Random reduction] in the example image we used, the cat is in the middle of the image, but not all images are like this. The pooling layer can reduce the sensitivity of the convolution layer to the target position. In addition, we can crop the image randomly to make the object appear in different positions of the image in different proportions. This can also reduce the sensitivity of the model to the target location.
# Flip left and right
# upside down 
# Random reduction
shape_aug = torchvision.transforms.RandomResizedCrop(
# (200200) is the size of the picture, scale means randomly cut to the original scale, and ratio is the aspect ratio

2. Color change

Another augmentation method is to change the color.
We can change four aspects of image color:

  1. brightness
  2. contrast ratio
  3. saturation
  4. tone
# brightness
# contrast ratio
# saturation
# tone
# Mixed use

3. Multiple data augmentation methods are used for superposition

augs = torchvision.transforms.Compose(
     torchvision.transforms.RandomResizedCrop((200, 200), scale=(0.1, 1), ratio=(0.5, 2))

3, Training with image augmentation

# Download CIFA10 dataset test
all_images = torchvision.datasets.CIFAR10(
d2l.show_images([all_images[i][0] for i in range(32)] , 4,8,scale=0.8)
# The application is simple to flip left and right, up and down
# The format of the generated data is (batch size, number of channels, height, width)
train_augs = torchvision.transforms.Compose(
#  torchvision.transforms.RandomVerticalFlip(),

test_augs = torchvision.transforms.Compose([
# torchvision.transforms.RandomHorizontalFlip(),
#  torchvision.transforms.RandomVerticalFlip(),
# Load data
def load_cifar10(is_train,augs,batch_size):
    dataset = torchvision.datasets.CIFAR10(root="./data/",train=is_train,
    dataLoader =
    return dataLoader
# Multi GPU training and evaluation
def train_batch(net, X, y, loss, trainer, devices):
    if isinstance(X, list):
        # Fine tuning required in BERT (discussed later)
        X = [[0]) for x in X]
        X =[0])
    y =[0])
    pred = net(X)
    l = loss(pred, y)
    train_loss_sum = l.sum()
    train_acc_sum = d2l.accuracy(pred, y)
    return train_loss_sum, train_acc_sum
def train(net, train_iter, test_iter, loss, trainer, num_epochs,
    timer, num_batches = d2l.Timer(), len(train_iter)
    animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0, 1],
                            legend=['train loss', 'train acc', 'test acc'])
    net = nn.DataParallel(net, device_ids=devices).to(devices[0]) # Multi GPU operation
    for epoch in range(num_epochs):
        # Four dimensions: storage training loss, training accuracy, number of instances and number of features
        metric = d2l.Accumulator(4)
        for i, (features, labels) in enumerate(train_iter):
            l, acc = train_batch(net, features, labels, loss, trainer,
            metric.add(l, acc, labels.shape[0], labels.numel())
            if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
                    epoch + (i + 1) / num_batches,
                    (metric[0] / metric[2], metric[1] / metric[3], None))
        test_acc = d2l.evaluate_accuracy_gpu(net, test_iter)
        animator.add(epoch + 1, (None, None, test_acc))
    print(f'loss {metric[0] / metric[2]:.3f}, train acc '
          f'{metric[1] / metric[3]:.3f}, test acc {test_acc:.3f}')
    print(f'{metric[2] * num_epochs / timer.sum():.1f} examples/sec on '
#Using the enhanced data to train the model;
# Obtain all GPU s and use Adam as the optimization algorithm
batch_size, devices, net = 256, d2l.try_all_gpus(), d2l.resnet18(10, 3)

# Model initialization
def init_weights(m):
    if type(m) in [nn.Linear, nn.Conv2d]:


def train_with_data_aug(train_augs, test_augs, net, lr=0.001):
    train_iter = load_cifar10(True, train_augs, batch_size)
    test_iter = load_cifar10(False, test_augs, batch_size)
    loss = nn.CrossEntropyLoss(reduction="none")
    trainer = torch.optim.Adam(net.parameters(), lr=lr)
    train(net, train_iter, test_iter, loss, trainer, 10, devices)
# Data augmentation (flip left and right)
train_with_data_aug(train_augs, test_augs, net)
loss 0.166, train acc 0.942, test acc 0.823
453.4 examples/sec on [device(type='cuda', index=0)]
# No data augmentation
batch_size, devices, net = 256, d2l.try_all_gpus(), d2l.resnet18(10, 3)
def init_weights(m):
    if type(m) in [nn.Linear, nn.Conv2d]:

train_with_data_aug(test_augs, test_augs, net)
loss 0.070, train acc 0.975, test acc 0.797
455.5 examples/sec on [device(type='cuda', index=0)]

Comparison of results:

  1. Using image enhancement, although it is only a simple left-right flip, the prediction accuracy of our model is improved by 3%
  2. The over fitting of the model has a certain mitigation.

4, Summary

  1. Image augmentation generates random images based on the existing training data to improve the generalization ability of the model.
  2. In order to get the exact results in the prediction process, we usually only perform image augmentation on the training samples, and do not use random image augmentation in the prediction process. (training yes, prediction no)
  3. Deep learning framework provides many different image augmentation methods, which can be applied at the same time. (multiple enhancements are used together)
  4. Collection of image enhancement methods (these sorting should be enough. If you have any special needs, you can leave a message for discussion):

    (1) I know that some authors have summarized 15 enhancement methods and codes written by themselves:

    • Flip
    • Cutting
    • Filtering and sharpening
    • vague
    • Rotate, translate, cut, scale
    • Cut
    • color
    • brightness
    • contrast
    • Uniform and Gaussian noise
    • Gradient lens deformation

    (2) Find some mature code with high star on github:
    For example: imgaug


Keywords: Pytorch Deep Learning image identification

Added by malikah on Fri, 17 Dec 2021 23:19:48 +0200