### IoU

IoU (Intersection over Union), also known as overlap / intersection union ratio.

That is, the intersection/Union in the figure above. The code implementation is as follows:

# one pre, one gt def IoU(pred_box, gt_box): ixmin = max(pred_box[0], gt_box[0]) iymin = max(pred_box[1], gt_box[1]) ixmax = min(pred_box[2], gt_box[2]) iymax = min(pred_box[3], gt_box[3]) inter_w = np.maximum(ixmax - ixmin + 1., 0) inter_h = np.maximum(iymax - iymin + 1., 0) inters = inter_w * inter_h uni = ((pred_box[2] - pred_box[0] + 1.) * (pred_box[3] - pred_box[1] + 1.) + (gt_box[2] - gt_box[0] + 1.) * (gt_box[3] - gt_box[1] + 1.) - inters) ious = inters / uni return ious # multi pre, one gt def maxIoU(pred_box, gt_box): ixmin = np.maximum(pred_box[:, 0], gt_box[0]) iymin = np.maximum(pred_box[:, 1], gt_box[1]) ixmax = np.minimum(pred_box[:, 2], gt_box[2]) iymax = np.minimum(pred_box[:, 3], gt_box[3]) inters_w = np.maximum(ixmax - ixmin + 1., 0) # Finding the maximum and minimum values element by element broadcasting inters_h = np.maximum(iymax - iymin + 1., 0) # Finding the maximum and minimum values element by element broadcasting inters = inters_w * inters_h uni = ((pred_box[:, 2] - pred_box[:, 0] + 1.) * (pred_box[:, 3] - pred_box[:, 1] + 1.) + (gt_box[2] - gt_box[0] + 1.) * (gt_box[3] - gt_box[1] + 1.) - inters) ious = inters / uni iou = np.max(ious) iou_id = np.argmax(ious) return iou, iou_id # multi pre, multi gt def box_IoU(pred_box, gt_boxes): result = [] for gt_box in gt_boxes: temp = [] ixmin = np.maximum(pred_box[:, 0], gt_box[0]) iymin = np.maximum(pred_box[:, 1], gt_box[1]) ixmax = np.minimum(pred_box[:, 2], gt_box[2]) iymax = np.minimum(pred_box[:, 3], gt_box[3]) inters_w = np.maximum(ixmax - ixmin + 1., 0) # Finding the maximum and minimum values element by element broadcasting inters_h = np.maximum(iymax - iymin + 1., 0) # Finding the maximum and minimum values element by element broadcasting inters = inters_w * inters_h uni = ((pred_box[:, 2] - pred_box[:, 0] + 1.) * (pred_box[:, 3] - pred_box[:, 1] + 1.) + (gt_box[2] - gt_box[0] + 1.) * (gt_box[3] - gt_box[1] + 1.) - inters) ious = inters / uni iou = np.max(ious) iou_id = np.argmax(ious) temp.append(iou) temp.append(iou_id) result.append(temp) return result

And some loss functions related to IoU:

Introduction to target detection regression loss function: SmoothL1/IoU/GIoU/DIoU/CIoU Loss - Zhihu

Collection | summary of target detection regression loss function

Problems arising from data annotation: the probability distribution of the bounding box and the uncertainty of the bounding box predicted by the model.

Understanding the probability distribution of target detection bounding box - Zhihu

The Gaussian distribution of Bounding Box is modeled.

### NMS

NMS (non maximum suppression). Many of the detection results output by the detection model are redundant, which is the repeated prediction of the same object. Therefore, NMS needs to be used to suppress some. The specific methods are as follows:

- Each output of the model includes regression prediction bbox pre, classification prediction cls pre and classification prediction score cls score. All outputs of the model are divided by cls pre.
- All outputs of each category are sorted by cls score, and the output with the highest score is taken out each time. Calculate the IoU value with all the remaining outputs, filter out all the outputs whose IoU value reaches the threshold, and do not participate in the following steps.
- Take the second largest output of cls score from the remaining output after filtering, repeat the operation in step 2, select the third largest output, and repeat the operation in step 2 until all outputs of this category are traversed.
- Repeat steps 2 and 3 to traverse all outputs of all categories to obtain the final detection result.

Code implementation of each category filter:

def py_cpu_nms(dets, thresh): """Pure Python NMS baseline.""" x1 = dets[:, 0] # pred bbox top_x y1 = dets[:, 1] # pred bbox top_y x2 = dets[:, 2] # pred bbox bottom_x y2 = dets[:, 3] # pred bbox bottom_y scores = dets[:, 4] # pred bbox cls score areas = (x2 - x1 + 1) * (y2 - y1 + 1) # pred bbox areas order = scores.argsort()[::-1] # Sort pred bbox in descending order by score, corresponding to step-2 keep = [] # Reserved pred bbox after NMS while order.size > 0: i = order[0] # top-1 score bbox keep.append(i) # top-1 score is naturally retained xx1 = np.maximum(x1[i], x1[order[1:]]) # top-1 bbox (maximum score) and the remaining bbox in order to calculate NMS yy1 = np.maximum(y1[i], y1[order[1:]]) xx2 = np.minimum(x2[i], x2[order[1:]]) yy2 = np.minimum(y2[i], y2[order[1:]]) w = np.maximum(0.0, xx2 - xx1 + 1) h = np.maximum(0.0, yy2 - yy1 + 1) inter = w * h ovr = inter / (areas[i] + areas[order[1:]] - inter) # Ubiquitous IoU computing~~~ inds = np.where(ovr <= thresh)[0] # This operation can be understood for code breakpoint debugging. Combined with step-3, we want to eliminate all redundant bbox with the current top-1 bbox IOU > thresh, so the retained bbox is naturally the non redundant bbox with ovr < = thresh, and its inds are retained for further screening order = order[inds + 1] # Keeping effective bbox is the lucky one who has not been suppressed in this round of NMS. Why + 1? Because ind = 0 is the top-1 of this round of NMS, the remaining effective bbox is calculated with top-1 in the IoU calculation. inds corresponds to the original array. Naturally, it is necessary to map + 1, followed by the loop of step-4 return keep # Final NMS result return

And variants of NMS: