Neural networks - IoU & NMS

IoU

IoU (Intersection over Union), also known as overlap / intersection union ratio.

That is, the intersection/Union in the figure above. The code implementation is as follows:

# one pre, one gt
def IoU(pred_box, gt_box):
    ixmin = max(pred_box[0], gt_box[0])
    iymin = max(pred_box[1], gt_box[1])
    ixmax = min(pred_box[2], gt_box[2])
    iymax = min(pred_box[3], gt_box[3])
    inter_w = np.maximum(ixmax - ixmin + 1., 0)
    inter_h = np.maximum(iymax - iymin + 1., 0)

    inters = inter_w * inter_h

    uni = ((pred_box[2] - pred_box[0] + 1.) * (pred_box[3] - pred_box[1] + 1.) +
           (gt_box[2] - gt_box[0] + 1.) * (gt_box[3] - gt_box[1] + 1.) - inters)

    ious = inters / uni

    return ious

# multi pre, one gt
def maxIoU(pred_box, gt_box):
    ixmin = np.maximum(pred_box[:, 0], gt_box[0])
    iymin = np.maximum(pred_box[:, 1], gt_box[1])
    ixmax = np.minimum(pred_box[:, 2], gt_box[2])
    iymax = np.minimum(pred_box[:, 3], gt_box[3])
    inters_w = np.maximum(ixmax - ixmin + 1., 0)  # Finding the maximum and minimum values element by element broadcasting
    inters_h = np.maximum(iymax - iymin + 1., 0)  # Finding the maximum and minimum values element by element broadcasting

    inters = inters_w * inters_h

    uni = ((pred_box[:, 2] - pred_box[:, 0] + 1.) * (pred_box[:, 3] - pred_box[:, 1] + 1.) +
           (gt_box[2] - gt_box[0] + 1.) * (gt_box[3] - gt_box[1] + 1.) - inters)

    ious = inters / uni
    iou = np.max(ious)
    iou_id = np.argmax(ious)

    return iou, iou_id

# multi pre, multi gt
def box_IoU(pred_box, gt_boxes):
    result = []
    for gt_box in gt_boxes:
        temp = []
        ixmin = np.maximum(pred_box[:, 0], gt_box[0])
        iymin = np.maximum(pred_box[:, 1], gt_box[1])
        ixmax = np.minimum(pred_box[:, 2], gt_box[2])
        iymax = np.minimum(pred_box[:, 3], gt_box[3])
        inters_w = np.maximum(ixmax - ixmin + 1., 0)  # Finding the maximum and minimum values element by element broadcasting
        inters_h = np.maximum(iymax - iymin + 1., 0)  # Finding the maximum and minimum values element by element broadcasting

        inters = inters_w * inters_h

        uni = ((pred_box[:, 2] - pred_box[:, 0] + 1.) * (pred_box[:, 3] - pred_box[:, 1] + 1.) +
               (gt_box[2] - gt_box[0] + 1.) * (gt_box[3] - gt_box[1] + 1.) - inters)

        ious = inters / uni
        iou = np.max(ious)
        iou_id = np.argmax(ious)

        temp.append(iou)
        temp.append(iou_id)
        result.append(temp)
    return result

And some loss functions related to IoU:

Introduction to target detection regression loss function: SmoothL1/IoU/GIoU/DIoU/CIoU Loss - Zhihu

Collection | summary of target detection regression loss function

Problems arising from data annotation: the probability distribution of the bounding box and the uncertainty of the bounding box predicted by the model.

Understanding the probability distribution of target detection bounding box - Zhihu

The Gaussian distribution of Bounding Box is modeled.

Wuhan University proposed NWD: a new paradigm of small target detection, abandoning the IOU based violence rising point (reaching the top SOTA)

NMS

NMS (non maximum suppression). Many of the detection results output by the detection model are redundant, which is the repeated prediction of the same object. Therefore, NMS needs to be used to suppress some. The specific methods are as follows:

  1. Each output of the model includes regression prediction bbox pre, classification prediction cls pre and classification prediction score cls score. All outputs of the model are divided by cls pre.
  2. All outputs of each category are sorted by cls score, and the output with the highest score is taken out each time. Calculate the IoU value with all the remaining outputs, filter out all the outputs whose IoU value reaches the threshold, and do not participate in the following steps.
  3. Take the second largest output of cls score from the remaining output after filtering, repeat the operation in step 2, select the third largest output, and repeat the operation in step 2 until all outputs of this category are traversed.
  4. Repeat steps 2 and 3 to traverse all outputs of all categories to obtain the final detection result.

Code implementation of each category filter:

def py_cpu_nms(dets, thresh):
    """Pure Python NMS baseline."""
    x1 = dets[:, 0]                     # pred bbox top_x
    y1 = dets[:, 1]                     # pred bbox top_y
    x2 = dets[:, 2]                     # pred bbox bottom_x
    y2 = dets[:, 3]                     # pred bbox bottom_y
    scores = dets[:, 4]              # pred bbox cls score

    areas = (x2 - x1 + 1) * (y2 - y1 + 1)    # pred bbox areas
    order = scores.argsort()[::-1]              # Sort pred bbox in descending order by score, corresponding to step-2

    keep = []    # Reserved pred bbox after NMS
    while order.size > 0:
        i = order[0]          # top-1 score bbox
        keep.append(i)   # top-1 score is naturally retained
        xx1 = np.maximum(x1[i], x1[order[1:]])   # top-1 bbox (maximum score) and the remaining bbox in order to calculate NMS
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        ovr = inter / (areas[i] + areas[order[1:]] - inter)      # Ubiquitous IoU computing~~~

        inds = np.where(ovr <= thresh)[0]     # This operation can be understood for code breakpoint debugging. Combined with step-3, we want to eliminate all redundant bbox with the current top-1 bbox IOU > thresh, so the retained bbox is naturally the non redundant bbox with ovr < = thresh, and its inds are retained for further screening
        order = order[inds + 1]   # Keeping effective bbox is the lucky one who has not been suppressed in this round of NMS. Why + 1? Because ind = 0 is the top-1 of this round of NMS, the remaining effective bbox is calculated with top-1 in the IoU calculation. inds corresponds to the original array. Naturally, it is necessary to map + 1, followed by the loop of step-4

    return keep    # Final NMS result return

And variants of NMS:

NMS for target detection - precision improvement - Zhihu

NMS can also play tricks... - you know

Keywords: neural networks Computer Vision Deep Learning

Added by webspinner on Mon, 03 Jan 2022 07:33:21 +0200