# YOLOv5 input Mosaic data enhancement | CSDN creative punch in

Novice beginners hope to take notes to record what they have learned. They also hope to help those who are also beginners. They also hope that the big guys can help correct mistakes ~ infringement legislation and deletion.

catalogue

1, Principle analysis

2, Code analysis

3,random_perspective() function (see code analysis for details)

# 1, Principle analysis

YOLOv5 uses the same Mosaic data enhancement as YOLOv4.

Main principle: it randomly cuts a selected picture and three random pictures, and then splices them into a picture as training data.

This can enrich the background of the picture, and the four pictures are spliced together to improve the batch in disguise_ Size, four pictures will also be calculated during batch normalization.

This allows YOLOv5 to batch itself_ Size is not very dependent.

# 2, Code analysis

### 1. Main part - load_mosaic

```    labels4, segments4 = [], []
s = self.img_size #Get image size
yc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border)  # mosaic center x, y
#random.uniform randomly generates real numbers in the above range (i.e. half the image size to 1.5 times the image size)
#Here is the randomly generated mosaic center point```

First initialize the label list to be empty, and then obtain the image size s

According to the image size, random Uniform() randomly generates mosaic center points, ranging from half the image size to 1.5 times the image size

```    indices = [index] + random.choices(self.indices, k=3)  # 3 additional image indices
#Randomly generate the index of another 3 pictures
#random.choices - randomly generate an index within the total number of three pictures
#Then the index together with the original selected pictures are packaged into indexes
random.shuffle(indices)
#Sort these index values randomly```

Using random Choices () randomly generates the indexes of the other three pictures, fills the indexes of these four pictures into the indexes list, and then uses random Shuffle () sorts these index values randomly

```for i, index in enumerate(indices): #Loop through these pictures
img, _, (h, w) = load_image(self, index)#Load picture and height and width```

Loop through the four pictures and call load_ The image() function loads the image and the corresponding height and width

The next step is how to place these four pictures~

```        # place img in img4
if i == 0:  # top left
img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles
#Mr. Cheng background map
x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)
#Set the position on the large picture (either the original size or zoom in) (w, h) or (xc, yc) (the newly generated large picture)
x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)
#Select the position on the thumbnail (original)```

The first picture is in the upper left corner

img4 starts with NP The full() function fills in the initialization large picture, which is the size of four pictures

Then set the position of the picture on the large picture and the corresponding position coordinates intercepted on the original picture (i.e. small picture)

```        elif i == 1:  # top right
x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
elif i == 2:  # bottom left
x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)
elif i == 3:  # bottom right
x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)```

The remaining three are processed like this

```        img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]
#Paste the corresponding small picture on the large picture```

Paste the corresponding part of the small picture on the large picture

```        padw = x1a - x1b
#Calculate the offset from the small image to the large image, which is used to calculate the position of the label after mosaic enhancement```

Calculate the offset from the small image to the large image, which is used to calculate the position of the label after mosaic enhancement

```        # Labels
labels, segments = self.labels[index].copy(), self.segments[index].copy()
#Get tag
if labels.size:
labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh)  # normalized xywh to pixel xyxy format
#Normalize xywh (percentage values) to pixel xy format
#Convert to pixel segment
labels4.append(labels)
segments4.extend(segments)
#Fill in the list```

Initialize label label:

First read the label of the corresponding picture, and then standardize the label in xywh format into pixel xy format.

segments to pixel segment format

Then fill in the label list prepared before

```    # Concat/clip labels
labels4 = np.concatenate(labels4, 0) #Complete array splicing
for x in (labels4[:, 1:], *segments4):
np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()
#np.clip interception function, fixed value within 0 to 2s
# img4, labels4 = replicate(img4, labels4)  # replicate```

First, splice the label list into an array, convert the format to facilitate the following processing, and intercept the data at 0 to 2 times the image size

```    # Augment
#When mosaic is performed, the shape of the four pictures is [2*img_size,2*img_size]
#The mosaic integrated pictures are randomly rotated, translated, scaled and cropped, and resize d to the input size img_size
img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])
img4, labels4 = random_perspective(img4, labels4, segments4,
degrees=self.hyp['degrees'],
translate=self.hyp['translate'],
scale=self.hyp['scale'],
shear=self.hyp['shear'],
perspective=self.hyp['perspective'],
border=self.mosaic_border)  # border to remove```

When mosaic is performed, the shape of the four pictures is [2*img_size,2*img_size]

The mosaic integrated pictures are randomly rotated, translated, scaled and cropped, and resize d to the input size img_size

`    return img4, labels4`

Finally, the processed picture and the corresponding label are returned

load_image function: loads a picture and resize s it according to the ratio of the set input size to the original size of the picture

First, get the picture of the index

```def load_image(self, i):
#load_image loads the picture and resize s it according to the ratio of the set input size to the original size of the picture
# loads 1 image from dataset index 'i', returns im, original hw, resized hw
im = self.imgs[i]#Get the picture of the index```

Judge whether the picture has cache, that is, whether it has been zoomed (I'm not sure if this understanding is correct. If it's wrong, please tell me in the comment area, thank you ~)

ðŸŽˆ without:

Go to the corresponding folder first

ðŸŒ³ If you can find it: load this picture

ðŸŒ³ If not found: read the path of this picture, and then report an error. The picture with the corresponding path cannot be found

Read the original height and width of the image and set the resize scale

If this ratio is not equal to 1, let's resize and zoom

Finally, return to the picture, the original height and width and the scaled height and width

```    if im is None:  # not cached in ram
#If the picture is not cached (that is, it has not been scaled)
npy = self.img_npy[i] #Look in the folder
if npy and npy.exists():  # load npy
path = self.img_files[i] #If you can't find the picture, read the path of the original picture
assert im is not None, f'Image Not Found {path}' #Report an error and can't find this picture
h0, w0 = im.shape[:2]  # orig hw
#Read the original height and width of this picture
r = self.img_size / max(h0, w0)  # ratio
#Set resize scale
if r != 1:  # if sizes are not equal
im = cv2.resize(im, (int(w0 * r), int(h0 * r)),
interpolation=cv2.INTER_AREA if r < 1 and not self.augment else cv2.INTER_LINEAR)#Achieve scaling
return im, (h0, w0), im.shape[:2]  # im, hw_original, hw_resized```

ðŸŽˆ If so

Then directly return to the original height and width and the scaled height and width of the picture~

```    else:
return self.imgs[i], self.img_hw0[i], self.img_hw[i]  # im, hw_original, hw_resized```

### 3,random_perspective() function (see code analysis for details)

Random transformation

Calculation method: product of coordinate vector and transformation matrix

First, get the height and width of the picture with the border

```def random_perspective(im, targets=(), segments=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0,
border=(0, 0)):
# torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(0.1, 0.1), scale=(0.9, 1.1), shear=(-10, 10))
# targets = [cls, xyxy]

#Picture height and width (with border border)
height = im.shape[0] + border[0] * 2  # shape(h,w,c)
width = im.shape[1] + border[1] * 2```

Then calculate the center point

```    # Center
C = np.eye(3)#Generate 3 * 3 diagonal matrix with diagonal 1
#Center of x direction
C[0, 2] = -im.shape[1] / 2  # x translation (pixels)
#Center of y direction
C[1, 2] = -im.shape[0] / 2  # y translation (pixels)```

Next is the matrix preparation of various transformations (rotation, etc.)

```    # Perspective
#perspective
P = np.eye(3)#Generate 3 * 3 diagonal matrix with diagonal 1
#Randomly generate perspective values in the x and y directions
P[2, 0] = random.uniform(-perspective, perspective)  # x perspective (about y)
P[2, 1] = random.uniform(-perspective, perspective)  # y perspective (about x)

# Rotation and Scale
#Rotate and scale
R = np.eye(3)#Generate 3 * 3 diagonal matrix with diagonal 1
a = random.uniform(-degrees, degrees)#Randomly generated angles within the range
# a += random.choice([-180, -90, 0, 90])  # add 90deg rotations to small rotations
s = random.uniform(1 - scale, 1 + scale) #Randomly generated scaling
# s = 2 ** random.uniform(-scale, scale)
R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)#The image rotation obtains the affine change matrix and assigns it to the first two lines of R

# Shear
#Bending angle
S = np.eye(3)#Generate 3 * 3 diagonal matrix with diagonal 1
S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)
S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)

# Translation
#Convert (zoom in and out?)
T = np.eye(3)
T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width  # x translation (pixels)
T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height  # y translation (pixels)
```

Then there is the combined rotation matrix

```    # Combined rotation matrix
#Combined rotation matrix
M = T @ S @ R @ P @ C  # order of operations (right to left) is IMPORTANT
#Combination by matrix multiplication
if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any():  # image changed
#There are no borders or any transformations
if perspective:#If Perspective
im = cv2.warpPerspective(im, M, dsize=(width, height), borderValue=(114, 114, 114))
#cv2. The warpperspective perspective transformation function can keep the straight line from deformation, but the parallel lines may no longer be parallel
else:  # affine
im = cv2.warpAffine(im, M[:2], dsize=(width, height), borderValue=(114, 114, 114))
#cv2.warpAffine radiation transformation function can realize rotation, translation and scaling, and the transformed parallel lines are still parallel```

Then transform the coordinates of the label

```    # Transform label coordinates
#Transform label coordinates
n = len(targets)#Number of targets
if n:#If there is a goal
use_segments = any(x.any() for x in segments)#Judge whether segments are empty or all are 0 (target pixel segment)
new = np.zeros((n, 4))#Initialize the information matrix, with 4 information xywh for each target
if use_segments:  # warp segments
#If not empty
segments = resample_segments(segments)  # upsample
#Up sampling
for i, segment in enumerate(segments):
xy = np.ones((len(segment), 3))
xy[:, :2] = segment#The first two columns are the pixel segments in the center of the target
xy = xy @ M.T  # Transform transform
xy = xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]  # perspective rescale or affine
#Perspective, rescale, or affine
#The last column of xy is all 1 so that when multiplied by the M.T matrix, it will only be multiplied by the last row of the last M.T, and the last row of the M.T is the perspective value set by P at that time

# clip construction
new[i] = segment2box(xy, width, height)

else:  # Warp box es
xy = np.ones((n * 4, 3))
xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1
xy = xy @ M.T  # transform
xy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8)  # perspective rescale or affine

# create new boxes
x = xy[:, [0, 2, 4, 6]]
y = xy[:, [1, 3, 5, 7]]
new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T

# clip
#Remove the box that is cut too small after the above series of operations
new[:, [0, 2]] = new[:, [0, 2]].clip(0, width)
new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)```

Finally, the candidate box is calculated and returned

```        # filter candidates
i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)#Calculate candidate box
targets = targets[i]
targets[:, 1:5] = new[i]

return im, targets```

Welcome to criticize and correct in the comment area, thank you~

Added by fahim_junoon on Thu, 10 Feb 2022 14:50:32 +0200