YOLOv5 input Mosaic data enhancement | CSDN creative punch in

Novice beginners hope to take notes to record what they have learned. They also hope to help those who are also beginners. They also hope that the big guys can help correct mistakes ~ infringement legislation and deletion.

catalogue

1, Principle analysis

2, Code analysis

1. Main part - load_mosaic

2,load_image function

3,random_perspective() function (see code analysis for details)

1, Principle analysis

YOLOv5 uses the same Mosaic data enhancement as YOLOv4.

Main principle: it randomly cuts a selected picture and three random pictures, and then splices them into a picture as training data.

This can enrich the background of the picture, and the four pictures are spliced together to improve the batch in disguise_ Size, four pictures will also be calculated during batch normalization.

This allows YOLOv5 to batch itself_ Size is not very dependent.

2, Code analysis

1. Main part - load_mosaic

    labels4, segments4 = [], []
    s = self.img_size #Get image size
    yc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border)  # mosaic center x, y
    #random.uniform randomly generates real numbers in the above range (i.e. half the image size to 1.5 times the image size)
    #Here is the randomly generated mosaic center point

First initialize the label list to be empty, and then obtain the image size s

According to the image size, random Uniform() randomly generates mosaic center points, ranging from half the image size to 1.5 times the image size

    indices = [index] + random.choices(self.indices, k=3)  # 3 additional image indices
    #Randomly generate the index of another 3 pictures
    #random.choices - randomly generate an index within the total number of three pictures
    #Then the index together with the original selected pictures are packaged into indexes
    random.shuffle(indices)
    #Sort these index values randomly

Using random Choices () randomly generates the indexes of the other three pictures, fills the indexes of these four pictures into the indexes list, and then uses random Shuffle () sorts these index values randomly

for i, index in enumerate(indices): #Loop through these pictures
        # Load image
        img, _, (h, w) = load_image(self, index)#Load picture and height and width

Loop through the four pictures and call load_ The image() function loads the image and the corresponding height and width

The next step is how to place these four pictures~

        # place img in img4
        if i == 0:  # top left
            img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles
            #Mr. Cheng background map
            x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)
            #Set the position on the large picture (either the original size or zoom in) (w, h) or (xc, yc) (the newly generated large picture)
            x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)
            #Select the position on the thumbnail (original)

The first picture is in the upper left corner

img4 starts with NP The full() function fills in the initialization large picture, which is the size of four pictures

Then set the position of the picture on the large picture and the corresponding position coordinates intercepted on the original picture (i.e. small picture)

        elif i == 1:  # top right
            x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
            x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
        elif i == 2:  # bottom left
            x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
            x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)
        elif i == 3:  # bottom right
            x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
            x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)

The remaining three are processed like this

        img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]
        #Paste the corresponding small picture on the large picture

Paste the corresponding part of the small picture on the large picture

        padw = x1a - x1b
        padh = y1a - y1b
        #Calculate the offset from the small image to the large image, which is used to calculate the position of the label after mosaic enhancement

Calculate the offset from the small image to the large image, which is used to calculate the position of the label after mosaic enhancement

        # Labels
        labels, segments = self.labels[index].copy(), self.segments[index].copy()
        #Get tag
        if labels.size:
            labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh)  # normalized xywh to pixel xyxy format
            #Normalize xywh (percentage values) to pixel xy format
            segments = [xyn2xy(x, w, h, padw, padh) for x in segments]
            #Convert to pixel segment
        labels4.append(labels)
        segments4.extend(segments)
        #Fill in the list

Initialize label label:

First read the label of the corresponding picture, and then standardize the label in xywh format into pixel xy format.

segments to pixel segment format

Then fill in the label list prepared before

    # Concat/clip labels
    labels4 = np.concatenate(labels4, 0) #Complete array splicing
    for x in (labels4[:, 1:], *segments4):
        np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()
        #np.clip interception function, fixed value within 0 to 2s
    # img4, labels4 = replicate(img4, labels4)  # replicate

First, splice the label list into an array, convert the format to facilitate the following processing, and intercept the data at 0 to 2 times the image size

    # Augment
    #When mosaic is performed, the shape of the four pictures is [2*img_size,2*img_size]
    #The mosaic integrated pictures are randomly rotated, translated, scaled and cropped, and resize d to the input size img_size
    img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])
    img4, labels4 = random_perspective(img4, labels4, segments4,
                                       degrees=self.hyp['degrees'],
                                       translate=self.hyp['translate'],
                                       scale=self.hyp['scale'],
                                       shear=self.hyp['shear'],
                                       perspective=self.hyp['perspective'],
                                       border=self.mosaic_border)  # border to remove

When mosaic is performed, the shape of the four pictures is [2*img_size,2*img_size]

The mosaic integrated pictures are randomly rotated, translated, scaled and cropped, and resize d to the input size img_size

    return img4, labels4

Finally, the processed picture and the corresponding label are returned

2,load_image function

load_image function: loads a picture and resize s it according to the ratio of the set input size to the original size of the picture

First, get the picture of the index

def load_image(self, i):
    #load_image loads the picture and resize s it according to the ratio of the set input size to the original size of the picture
    # loads 1 image from dataset index 'i', returns im, original hw, resized hw
    im = self.imgs[i]#Get the picture of the index

Judge whether the picture has cache, that is, whether it has been zoomed (I'm not sure if this understanding is correct. If it's wrong, please tell me in the comment area, thank you ~)

🎈 without:

Go to the corresponding folder first

🌳 If you can find it: load this picture

🌳 If not found: read the path of this picture, and then report an error. The picture with the corresponding path cannot be found

Read the original height and width of the image and set the resize scale

If this ratio is not equal to 1, let's resize and zoom

Finally, return to the picture, the original height and width and the scaled height and width

    if im is None:  # not cached in ram
        #If the picture is not cached (that is, it has not been scaled)
        npy = self.img_npy[i] #Look in the folder
        if npy and npy.exists():  # load npy
            im = np.load(npy) #When we find it, we load this picture
        else:  # read image
            path = self.img_files[i] #If you can't find the picture, read the path of the original picture
            im = cv2.imread(path)  # BGR
            assert im is not None, f'Image Not Found {path}' #Report an error and can't find this picture
        h0, w0 = im.shape[:2]  # orig hw
        #Read the original height and width of this picture
        r = self.img_size / max(h0, w0)  # ratio 
        #Set resize scale
        if r != 1:  # if sizes are not equal
            im = cv2.resize(im, (int(w0 * r), int(h0 * r)),
                            interpolation=cv2.INTER_AREA if r < 1 and not self.augment else cv2.INTER_LINEAR)#Achieve scaling
        return im, (h0, w0), im.shape[:2]  # im, hw_original, hw_resized

🎈 If so

Then directly return to the original height and width and the scaled height and width of the picture~

    else:
        return self.imgs[i], self.img_hw0[i], self.img_hw[i]  # im, hw_original, hw_resized

3,random_perspective() function (see code analysis for details)

Random transformation

Calculation method: product of coordinate vector and transformation matrix

First, get the height and width of the picture with the border

def random_perspective(im, targets=(), segments=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0,
                       border=(0, 0)):
    # torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(0.1, 0.1), scale=(0.9, 1.1), shear=(-10, 10))
    # targets = [cls, xyxy]

    #Picture height and width (with border border)
    height = im.shape[0] + border[0] * 2  # shape(h,w,c)
    width = im.shape[1] + border[1] * 2

Then calculate the center point

    # Center
    C = np.eye(3)#Generate 3 * 3 diagonal matrix with diagonal 1
    #Center of x direction
    C[0, 2] = -im.shape[1] / 2  # x translation (pixels)
    #Center of y direction
    C[1, 2] = -im.shape[0] / 2  # y translation (pixels)

Next is the matrix preparation of various transformations (rotation, etc.)

    # Perspective
    #perspective
    P = np.eye(3)#Generate 3 * 3 diagonal matrix with diagonal 1
    #Randomly generate perspective values in the x and y directions
    P[2, 0] = random.uniform(-perspective, perspective)  # x perspective (about y)
    P[2, 1] = random.uniform(-perspective, perspective)  # y perspective (about x)

    # Rotation and Scale
    #Rotate and scale
    R = np.eye(3)#Generate 3 * 3 diagonal matrix with diagonal 1
    a = random.uniform(-degrees, degrees)#Randomly generated angles within the range
    # a += random.choice([-180, -90, 0, 90])  # add 90deg rotations to small rotations
    s = random.uniform(1 - scale, 1 + scale) #Randomly generated scaling
    # s = 2 ** random.uniform(-scale, scale)
    R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)#The image rotation obtains the affine change matrix and assigns it to the first two lines of R

    # Shear
    #Bending angle
    S = np.eye(3)#Generate 3 * 3 diagonal matrix with diagonal 1
    S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)
    S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)

    # Translation
    #Convert (zoom in and out?)
    T = np.eye(3)
    T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width  # x translation (pixels)
    T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height  # y translation (pixels)

Then there is the combined rotation matrix

    # Combined rotation matrix
    #Combined rotation matrix
    M = T @ S @ R @ P @ C  # order of operations (right to left) is IMPORTANT
    #Combination by matrix multiplication
    if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any():  # image changed
        #There are no borders or any transformations
        if perspective:#If Perspective
            im = cv2.warpPerspective(im, M, dsize=(width, height), borderValue=(114, 114, 114))
            #cv2. The warpperspective perspective transformation function can keep the straight line from deformation, but the parallel lines may no longer be parallel
        else:  # affine
            im = cv2.warpAffine(im, M[:2], dsize=(width, height), borderValue=(114, 114, 114))
            #cv2.warpAffine radiation transformation function can realize rotation, translation and scaling, and the transformed parallel lines are still parallel

Then transform the coordinates of the label

    # Transform label coordinates
    #Transform label coordinates
    n = len(targets)#Number of targets
    if n:#If there is a goal
        use_segments = any(x.any() for x in segments)#Judge whether segments are empty or all are 0 (target pixel segment)
        new = np.zeros((n, 4))#Initialize the information matrix, with 4 information xywh for each target
        if use_segments:  # warp segments
            #If not empty
            segments = resample_segments(segments)  # upsample
            #Up sampling
            for i, segment in enumerate(segments):
                xy = np.ones((len(segment), 3))
                xy[:, :2] = segment#The first two columns are the pixel segments in the center of the target
                xy = xy @ M.T  # Transform transform
                xy = xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]  # perspective rescale or affine
                #Perspective, rescale, or affine
                #The last column of xy is all 1 so that when multiplied by the M.T matrix, it will only be multiplied by the last row of the last M.T, and the last row of the M.T is the perspective value set by P at that time

                # clip construction
                new[i] = segment2box(xy, width, height)

        else:  # Warp box es
            xy = np.ones((n * 4, 3))
            xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1
            xy = xy @ M.T  # transform
            xy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8)  # perspective rescale or affine

            # create new boxes
            x = xy[:, [0, 2, 4, 6]]
            y = xy[:, [1, 3, 5, 7]]
            new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T

            # clip
            #Remove the box that is cut too small after the above series of operations
            new[:, [0, 2]] = new[:, [0, 2]].clip(0, width)
            new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)

Finally, the candidate box is calculated and returned

        # filter candidates
        i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)#Calculate candidate box
        targets = targets[i]
        targets[:, 1:5] = new[i]

    return im, targets

Welcome to criticize and correct in the comment area, thank you~

Keywords: Python AI neural networks Computer Vision Deep Learning

Added by fahim_junoon on Thu, 10 Feb 2022 14:50:32 +0200