Novice beginners hope to take notes to record what they have learned. They also hope to help those who are also beginners. They also hope that the big guys can help correct mistakes ~ infringement legislation and deletion.
catalogue
3,random_perspective() function (see code analysis for details)
1, Principle analysis
YOLOv5 uses the same Mosaic data enhancement as YOLOv4.
Main principle: it randomly cuts a selected picture and three random pictures, and then splices them into a picture as training data.
This can enrich the background of the picture, and the four pictures are spliced together to improve the batch in disguise_ Size, four pictures will also be calculated during batch normalization.
This allows YOLOv5 to batch itself_ Size is not very dependent.
2, Code analysis
1. Main part - load_mosaic
labels4, segments4 = [], [] s = self.img_size #Get image size yc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border) # mosaic center x, y #random.uniform randomly generates real numbers in the above range (i.e. half the image size to 1.5 times the image size) #Here is the randomly generated mosaic center point
First initialize the label list to be empty, and then obtain the image size s
According to the image size, random Uniform() randomly generates mosaic center points, ranging from half the image size to 1.5 times the image size
indices = [index] + random.choices(self.indices, k=3) # 3 additional image indices #Randomly generate the index of another 3 pictures #random.choices - randomly generate an index within the total number of three pictures #Then the index together with the original selected pictures are packaged into indexes random.shuffle(indices) #Sort these index values randomly
Using random Choices () randomly generates the indexes of the other three pictures, fills the indexes of these four pictures into the indexes list, and then uses random Shuffle () sorts these index values randomly
for i, index in enumerate(indices): #Loop through these pictures # Load image img, _, (h, w) = load_image(self, index)#Load picture and height and width
Loop through the four pictures and call load_ The image() function loads the image and the corresponding height and width
The next step is how to place these four pictures~
# place img in img4 if i == 0: # top left img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8) # base image with 4 tiles #Mr. Cheng background map x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc # xmin, ymin, xmax, ymax (large image) #Set the position on the large picture (either the original size or zoom in) (w, h) or (xc, yc) (the newly generated large picture) x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h # xmin, ymin, xmax, ymax (small image) #Select the position on the thumbnail (original)
The first picture is in the upper left corner
img4 starts with NP The full() function fills in the initialization large picture, which is the size of four pictures
Then set the position of the picture on the large picture and the corresponding position coordinates intercepted on the original picture (i.e. small picture)
elif i == 1: # top right x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h elif i == 2: # bottom left x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h) x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h) elif i == 3: # bottom right x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h) x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)
The remaining three are processed like this
img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b] # img4[ymin:ymax, xmin:xmax] #Paste the corresponding small picture on the large picture
Paste the corresponding part of the small picture on the large picture
padw = x1a - x1b padh = y1a - y1b #Calculate the offset from the small image to the large image, which is used to calculate the position of the label after mosaic enhancement
Calculate the offset from the small image to the large image, which is used to calculate the position of the label after mosaic enhancement
# Labels labels, segments = self.labels[index].copy(), self.segments[index].copy() #Get tag if labels.size: labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh) # normalized xywh to pixel xyxy format #Normalize xywh (percentage values) to pixel xy format segments = [xyn2xy(x, w, h, padw, padh) for x in segments] #Convert to pixel segment labels4.append(labels) segments4.extend(segments) #Fill in the list
Initialize label label:
First read the label of the corresponding picture, and then standardize the label in xywh format into pixel xy format.
segments to pixel segment format
Then fill in the label list prepared before
# Concat/clip labels labels4 = np.concatenate(labels4, 0) #Complete array splicing for x in (labels4[:, 1:], *segments4): np.clip(x, 0, 2 * s, out=x) # clip when using random_perspective() #np.clip interception function, fixed value within 0 to 2s # img4, labels4 = replicate(img4, labels4) # replicate
First, splice the label list into an array, convert the format to facilitate the following processing, and intercept the data at 0 to 2 times the image size
# Augment #When mosaic is performed, the shape of the four pictures is [2*img_size,2*img_size] #The mosaic integrated pictures are randomly rotated, translated, scaled and cropped, and resize d to the input size img_size img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste']) img4, labels4 = random_perspective(img4, labels4, segments4, degrees=self.hyp['degrees'], translate=self.hyp['translate'], scale=self.hyp['scale'], shear=self.hyp['shear'], perspective=self.hyp['perspective'], border=self.mosaic_border) # border to remove
When mosaic is performed, the shape of the four pictures is [2*img_size,2*img_size]
The mosaic integrated pictures are randomly rotated, translated, scaled and cropped, and resize d to the input size img_size
return img4, labels4
Finally, the processed picture and the corresponding label are returned
2,load_image function
load_image function: loads a picture and resize s it according to the ratio of the set input size to the original size of the picture
First, get the picture of the index
def load_image(self, i): #load_image loads the picture and resize s it according to the ratio of the set input size to the original size of the picture # loads 1 image from dataset index 'i', returns im, original hw, resized hw im = self.imgs[i]#Get the picture of the index
Judge whether the picture has cache, that is, whether it has been zoomed (I'm not sure if this understanding is correct. If it's wrong, please tell me in the comment area, thank you ~)
🎈 without:
Go to the corresponding folder first
🌳 If you can find it: load this picture
🌳 If not found: read the path of this picture, and then report an error. The picture with the corresponding path cannot be found
Read the original height and width of the image and set the resize scale
If this ratio is not equal to 1, let's resize and zoom
Finally, return to the picture, the original height and width and the scaled height and width
if im is None: # not cached in ram #If the picture is not cached (that is, it has not been scaled) npy = self.img_npy[i] #Look in the folder if npy and npy.exists(): # load npy im = np.load(npy) #When we find it, we load this picture else: # read image path = self.img_files[i] #If you can't find the picture, read the path of the original picture im = cv2.imread(path) # BGR assert im is not None, f'Image Not Found {path}' #Report an error and can't find this picture h0, w0 = im.shape[:2] # orig hw #Read the original height and width of this picture r = self.img_size / max(h0, w0) # ratio #Set resize scale if r != 1: # if sizes are not equal im = cv2.resize(im, (int(w0 * r), int(h0 * r)), interpolation=cv2.INTER_AREA if r < 1 and not self.augment else cv2.INTER_LINEAR)#Achieve scaling return im, (h0, w0), im.shape[:2] # im, hw_original, hw_resized
🎈 If so
Then directly return to the original height and width and the scaled height and width of the picture~
else: return self.imgs[i], self.img_hw0[i], self.img_hw[i] # im, hw_original, hw_resized
3,random_perspective() function (see code analysis for details)
Random transformation
Calculation method: product of coordinate vector and transformation matrix
First, get the height and width of the picture with the border
def random_perspective(im, targets=(), segments=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0, border=(0, 0)): # torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(0.1, 0.1), scale=(0.9, 1.1), shear=(-10, 10)) # targets = [cls, xyxy] #Picture height and width (with border border) height = im.shape[0] + border[0] * 2 # shape(h,w,c) width = im.shape[1] + border[1] * 2
Then calculate the center point
# Center C = np.eye(3)#Generate 3 * 3 diagonal matrix with diagonal 1 #Center of x direction C[0, 2] = -im.shape[1] / 2 # x translation (pixels) #Center of y direction C[1, 2] = -im.shape[0] / 2 # y translation (pixels)
Next is the matrix preparation of various transformations (rotation, etc.)
# Perspective #perspective P = np.eye(3)#Generate 3 * 3 diagonal matrix with diagonal 1 #Randomly generate perspective values in the x and y directions P[2, 0] = random.uniform(-perspective, perspective) # x perspective (about y) P[2, 1] = random.uniform(-perspective, perspective) # y perspective (about x) # Rotation and Scale #Rotate and scale R = np.eye(3)#Generate 3 * 3 diagonal matrix with diagonal 1 a = random.uniform(-degrees, degrees)#Randomly generated angles within the range # a += random.choice([-180, -90, 0, 90]) # add 90deg rotations to small rotations s = random.uniform(1 - scale, 1 + scale) #Randomly generated scaling # s = 2 ** random.uniform(-scale, scale) R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)#The image rotation obtains the affine change matrix and assigns it to the first two lines of R # Shear #Bending angle S = np.eye(3)#Generate 3 * 3 diagonal matrix with diagonal 1 S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180) # x shear (deg) S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180) # y shear (deg) # Translation #Convert (zoom in and out?) T = np.eye(3) T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width # x translation (pixels) T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height # y translation (pixels)
Then there is the combined rotation matrix
# Combined rotation matrix #Combined rotation matrix M = T @ S @ R @ P @ C # order of operations (right to left) is IMPORTANT #Combination by matrix multiplication if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any(): # image changed #There are no borders or any transformations if perspective:#If Perspective im = cv2.warpPerspective(im, M, dsize=(width, height), borderValue=(114, 114, 114)) #cv2. The warpperspective perspective transformation function can keep the straight line from deformation, but the parallel lines may no longer be parallel else: # affine im = cv2.warpAffine(im, M[:2], dsize=(width, height), borderValue=(114, 114, 114)) #cv2.warpAffine radiation transformation function can realize rotation, translation and scaling, and the transformed parallel lines are still parallel
Then transform the coordinates of the label
# Transform label coordinates #Transform label coordinates n = len(targets)#Number of targets if n:#If there is a goal use_segments = any(x.any() for x in segments)#Judge whether segments are empty or all are 0 (target pixel segment) new = np.zeros((n, 4))#Initialize the information matrix, with 4 information xywh for each target if use_segments: # warp segments #If not empty segments = resample_segments(segments) # upsample #Up sampling for i, segment in enumerate(segments): xy = np.ones((len(segment), 3)) xy[:, :2] = segment#The first two columns are the pixel segments in the center of the target xy = xy @ M.T # Transform transform xy = xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2] # perspective rescale or affine #Perspective, rescale, or affine #The last column of xy is all 1 so that when multiplied by the M.T matrix, it will only be multiplied by the last row of the last M.T, and the last row of the M.T is the perspective value set by P at that time # clip construction new[i] = segment2box(xy, width, height) else: # Warp box es xy = np.ones((n * 4, 3)) xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2) # x1y1, x2y2, x1y2, x2y1 xy = xy @ M.T # transform xy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8) # perspective rescale or affine # create new boxes x = xy[:, [0, 2, 4, 6]] y = xy[:, [1, 3, 5, 7]] new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T # clip #Remove the box that is cut too small after the above series of operations new[:, [0, 2]] = new[:, [0, 2]].clip(0, width) new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)
Finally, the candidate box is calculated and returned
# filter candidates i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)#Calculate candidate box targets = targets[i] targets[:, 1:5] = new[i] return im, targets
Welcome to criticize and correct in the comment area, thank you~