PyTorch learning - 7 Data preprocessing module - transforms

1, Data preprocessing transforms module mechanism

  torchvision. The transforms module contains many image preprocessing methods:

  • Data centralization
  • Data standardization
  • zoom
  • Cutting
  • rotate
  • Flip
  • fill
  • Noise addition
  • Gray transformation
  • linear transformation
  • affine transformation
  • Brightness, saturation and contrast transformation

This module can perform data enhancement and data preprocessing to enhance the generalization ability of the model. In the process of data reading, data preprocessing transforms finally generates batch data after data preprocessing.

class RMBDataset(Dataset):
    def __init__(self, data_dir, transform=None):
        """
        rmb Denomination classification task Dataset
        :param data_dir: str, Path of dataset
        :param transform: torch.transform,Data preprocessing
        """
        self.label_name = {"1": 0, "100": 1}
        self.data_info = self.get_img_info(data_dir)  # data_info stores all picture paths and labels, and reads samples through index in DataLoader
        self.transform = transform

    def __getitem__(self, index):
        path_img, label = self.data_info[index]
        img = Image.open(path_img).convert('RGB')     # 0~255
		
		# The transform is called at the node where the data is read, and multiple tansform methods are used iteratively
        if self.transform is not None:
            img = self.transform(img)   # Do transform here, turn it into tensor, etc

        return img, label

    def __len__(self):
        return len(self.data_info)

    @staticmethod
    def get_img_info(data_dir):
        data_info = list()
        for root, dirs, _ in os.walk(data_dir):
            # Traversal category
            for sub_dir in dirs:
                img_names = os.listdir(os.path.join(root, sub_dir))
                img_names = list(filter(lambda x: x.endswith('.jpg'), img_names))

                # Traversal picture
                for i in range(len(img_names)):
                    img_name = img_names[i]
                    path_img = os.path.join(root, sub_dir, img_name)
                    label = rmb_label[sub_dir]
                    data_info.append((path_img, int(label)))

        return data_info

2, Twenty two data preprocessing methods for transforms

2. Cutting

  transforms.CenterCrop
Function: crop picture from image center

  • Size: the size of the picture to be cropped

  transforms.RandomCrop

transforms.RandomCrop(size,
	padding=None,
	pad_if_needed=False,fill=6,
	padding_mode= 'constant ' )

Function: randomly cut out the picture with size from the picture

  • Size: the size of the picture to be cropped
  • Padding: set padding size
    When it is a, a pixel is filled up, down, left and right
    When (a, b), fill B pixels up and down and a pixels left and right
    When it is (a, b, c, d), the left, upper, right and lower are filled with a, b, c and D respectively
  • pad_if_need: if the image is smaller than the set size, it will be filled
  • padding_mode: filling mode. There are four modes
    1. constant: the pixel value is set by filI
    2. Edge: the pixel value is determined by the image edge pixels
    3. reflect: Mirror filling, the last pixel is not mirrored, eg:[1,2,3.4] → [3,2,1,2,3,4,3,2]
    4. symmetric: Mirror filling, the last pixel is mirrored, eg:[1,2,3,4] → [2,1,1,2,3,4,3]
  • fill: constant, sets the pixel value of the fill

  transforms.RandomResizedCrop

RandomResizedCrop(size,
	scale=(0.08,1.0),
	ratio=(3/4,4/3),interpolation)

Function: crop pictures with random size and aspect ratio

  • Size: the size of the picture to be cropped
  • Scale: random clipping area scale, default (0.08, 1)
  • Ratio: random aspect ratio, default (3 / 4, 4 / 3)
  • Interpolation: interpolation method
    PIL.lmage.NEAREST
    PIL.lmage.BILINEAR
    PIL.lmage.BICUBIC

  transforms.FiveCrop

transforms .FiveCrop(size)

Function: cut out 5 pictures of size in the top, bottom, left, right and center of the image

# Convert tuple format to Tensor format
transforms.FiveCrop(112),
transforms.Lambda(lambda crops: torch.stack([(transforms.ToTensor()(crop)) for crop in crops]))

  transforms.TenCrop
Function: tenprop horizontally or vertically mirrors these 5 pictures to obtain 10 pictures

  • Size: the size of the picture to be cropped
  • vertical_flip: flip vertically
transforms.TenCrop(112, vertical_flip=False),
transforms.Lambda(lambda crops: torch.stack([(transforms.ToTensor()(crop)) for crop in crops]))

3. Turnover and rotation

  transforms.RandomHorizontalFlip

transforms.RandomHorizontalFlip(p=0.5)

Function: flip pictures according to probability level (left and right)

  • p: Turnover probability

  transforms.RandomVerticalFlip

transforms.RandomVerticalFlip(p=0.5)

Function: flip pictures vertically (up and down) according to probability

  • p: Turnover probability

  transforms.RandomRotation

RandomRotation(degrees,
	resample=False,expand=False,
	center=None)

Function: randomly rotate pictures

  • degrees: rotation angle
    When it is a, select the rotation angle between (- A, a)
    When (a,b), select the rotation angle between (a,b)
  • Resample: resample method
  • Expand: whether to expand the picture to keep the original information

4. Image transformation

  transforms.Pad

transforms.Pad(padding,
	fill=0,
	padding_mode= ' constant ' )

Function: fill the edge of the picture

  • Padding: set padding size
    When it is a, a pixel is filled up, down, left and right
    When (a, b), fill B pixels up and down and a pixels left and right
    When it is (a, b, c, d), the left, upper, right and lower are filled with a, b, c and D respectively
  • padding_mode: filling mode. There are four modes,
    constant, edge, reflect, and symmetric
  • When fill:constant, set the filled pixel value, (R, G,B)or(Gray)

  transforms.colorjitter

transforms.colorJitter(brightness=0,
	contrast=0,
	saturation=0,
	hue=0)

Function: adjust brightness, contrast, saturation and hue

  • Brightness: brightness adjustment factor
    When it is a, select randomly from [max(0, 1-a), 1+a]
    When (a, b), select randomly from [a, b]
  • Contrast: contrast parameter, the same as brightness
  • Saturation: saturation parameter, the same as brightness
  • Hue: hue parameter,
    When it is a, select the parameter from [- A, a],
    Note: 0 < = a < = 0.5
    When (a, b): select parameters from [a, b]
    Note: - 0.5 < = a < = B < = 0.5

  transforms.Grayscale

Grayscale(num_output_channels)

Function: convert pictures to grayscale images

  • num_ouput_channels: the number of output channels can only be set to 1 or 3

  transforms.RandomGrayscale
Function: convert the picture into gray image according to probability

  • num_ouput_channels: the number of output channels can only be set to 1 or 3
  • p: probability value, the probability that the image is converted into a gray image

  transforms.RandomAffine

transforms.RandomAffine(degrees,
	translate=None,scale=None,
	shear=None ,resample=False,fillcolor=)

Function: perform affine transformation on the image. Affine transformation is a two-dimensional linear transformation, which is composed of five basic atomic transformations: rotation, translation, scaling, staggered cutting and flipping

  • degrees: rotation angle setting
  • translate: translation interval setting
    For example (a, b), a sets the width, B sets the height, and the translation interval of the image in the wide dimension is
    -img_width * a < dx < img_width * a
  • scale: scale (in area)
  • fill_color: fill color settings
  • shear: setting of stagger angle, including horizontal stagger and vertical stagger
    If it is a, only the x-axis is staggered, and the staggered angle is between (- a, a)
    If (a, b), a sets the angle of x axis and B sets the angle of y
    If (a, b,c,d), a and B set the x-axis angle, and c and D set the y-axis angle
  • resample: resampling methods, including NEAREST, BILINEAR and BICUBIC

  transforms.RandomErasing

transforms.RandomErasing(p=0.5,
	scale=(0.02,0.33),
	ratio=(0.3,3.3),value=0,
	inplace=False)

Function: random occlusion of images

  • p: Probability value, the probability of performing the operation
  • scale: area of occluded area
  • Ratio: aspect ratio of occluded area
  • Value: sets the pixel value of the occlusion area, (R, G, B) or (Gray)

  transforms.Lambda(lambd)
Function: user defined lambda method

  • Lambd: lambda anonymous function
    lambda [arg1 [,arg2, ... , argn]] : expression

eg:

transforms. Lambda(lambda crops: torch.stack([transforms. Totensor()(crop) for crop in crops]))

5. Selection of transforms method

  transforms.RandomChoice
Function: randomly select one from a series of transforms methods

transforms. RandomChoice([transforms1,transforms2,transforms3])

  transforms.RandomApply
Function: perform a set of transforms operations according to probability

transforms.RandomApply([transforms1,transforms2,transforms3], p=0.5)

  transforms.RandomOrder
Function: disorganize a group of transforms operations

transforms. Randomorder([transforms1,transforms2,transforms3])

  transforms.Resize
Function: adjust the size of the picture
  transforms.Totensor
Function: convert the previous data structure into tensor
  transforms.Normalize

transforms.Normalize(mean,
	std,
	inplace=False)

Function: standardize the image channel by channel (the transformed data mean is 0 and the standard deviation is 1). The advantage of standardization is to speed up the convergence of the model.
o u t p u t = ( i n p u t − m e a n ) / s t d output = (input - mean) / std output=(input−mean)/std

  • Mean: mean value of each channel
  • std: standard deviation of each channel
  • inplace: whether to operate in place

The source code is as follows:

def normalize(tensor, mean, std, inplace=False):
    """Normalize a tensor image with mean and standard deviation.

    .. note::
        This transform acts out of place by default, i.e., it does not mutates the input tensor.

    See :class:`~torchvision.transforms.Normalize` for more details.

    Args:
        tensor (Tensor): Tensor image of size (C, H, W) to be normalized.
        mean (sequence): Sequence of means for each channel.
        std (sequence): Sequence of standard deviations for each channel.
        inplace(bool,optional): Bool to make this operation inplace.

    Returns:
        Tensor: Normalized Tensor image.
    """
    # Legitimacy judgment of input - whether it is Tensor
    if not _is_tensor_image(tensor):
        raise TypeError('tensor is not a torch image.')
	
	# Whether to operate in situ. If not, clone a tensor
    if not inplace:
        tensor = tensor.clone()
	
    dtype = tensor.dtype
    # The mean and variance are transformed into tensors
    mean = torch.as_tensor(mean, dtype=dtype, device=tensor.device)
    std = torch.as_tensor(std, dtype=dtype, device=tensor.device)
    # sub_: Underline indicates in-situ operation; (input - mean) / std
    tensor.sub_(mean[:, None, None]).div_(std[:, None, None])
    # Returns the transformed tensor
    return tensor

6. Customize transfroms method

  the transforms method is passed in the Compose class__ call__ Method called.

class Compose(object):
    """Composes several transforms together.

    Args:
        transforms (list of ``Transform`` objects): list of transforms to compose.

    Example:
        >>> transforms.Compose([
        >>>     transforms.CenterCrop(10),
        >>>     transforms.ToTensor(),
        >>> ])
    """

    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, img):
    	# Loop executes the transforms method
        for t in self.transforms:
            img = t(img)
        return img

    def __repr__(self):
        format_string = self.__class__.__name__ + '('
        for t in self.transforms:
            format_string += '\n'
            format_string += '    {0}'.format(t)
        format_string += '\n)'
        return format_string

We can find the following characteristics when calling transforms:

  1. Only one parameter is received and one parameter is returned
  2. Pay attention to the upstream and downstream outputs and inputs

  let's customize transforms. Its basic structure is:

class YourTransforms(object) :
	def __init_(self, ...):
		...
	def __cal1__(self, img):
		...
		return img

   salt and pepper noise, also known as impulse noise, is a random white point or black point. The white point is called salt noise and the black is pepper noise. Signal noise rate (SNR) is a measure of the proportion of noise, which is the proportion of image pixels in the image. We take salt and pepper noise as an example to customize the transforms method.

class AddPepperNoise(object):
    """Increase salt and pepper noise
    Args:
        snr (float): Signal Noise Rate
        p (float): Probability value, which performs the operation according to probability
    """

    def __init__(self, snr, p=0.9):
        assert isinstance(snr, float) and (isinstance(p, float))    # 2020 07 26 or --> and
        # Signal percentage
        self.snr = snr
        # probability
        self.p = p

    def __call__(self, img):
        """
        Args:
            img (PIL Image): PIL Image
        Returns:
            PIL Image: PIL image.
        """
        # Judgment of probability
        if random.uniform(0, 1) < self.p:
            # Convert data format to ndarray
            img_ = np.array(img).copy()
            # Height, width, number of channels
            h, w, c = img_.shape
            # Get signal percentage
            signal_pct = self.snr
            # Noise percentage
            noise_pct = (1 - self.snr)
            # Select 3 mask s according to probability
            mask = np.random.choice((0, 1, 2), size=(h, w, 1), p=[signal_pct, noise_pct/2., noise_pct/2.])
            mask = np.repeat(mask, c, axis=2)
            img_[mask == 1] = 255   # Salt noise
            img_[mask == 2] = 0     # Pepper noise
            return Image.fromarray(img_.astype('uint8')).convert('RGB')
        else:
            return img

If it is helpful to you, please praise and pay attention, which is really important to me!!! If you need to communicate with each other, please comment or send a private letter!

Added by brandone on Mon, 17 Jan 2022 13:34:15 +0200