Target detection YOLO series: fast iteration YOLO v5

Target detection YOLO series: fast iteration YOLO v5

Author: Glenn Jocher
Published on: 2020
Original Paper: no Paper published, through github( yolov5 )Release.

1. Overview

At the beginning of its release, it was controversial. Some people thought it could be called YOLOv5. However, with its excellent performance and perfect engineering supporting (transplanting to other platforms), YOLOv5 is still the most active model in the detection field (2021). YOLOv5 is not only extraordinary, but also very diligent. Since its release, five major versions have been released. Therefore, when using YOLOv5, you need to pay attention to its small version. The following are the differences in the network structure of each small version.

  • YOLOv5 5.0

    • P5 structure is consistent with 4.0

    • P6 4 output layers (stripe is 8, 16, 32 and 64 respectively)

  • YOLOv5 4.0

    • Add yolov5 3.0 NN Leakyrelu (0.1) and NN Hardwish() is replaced by NN SiLU()
  • YOLOv5 3.1

    • Mainly bug fix
  • YOLOv5 3.0

    • Sampling NN Hardwise() activation function
  • YOLOv5 2.0

    • The structure remains unchanged, mainly bugfix, but there is a compatibility problem in 1.0
  • YOLOv5 1.0

    • Born in the sky

In addition to the small version, in order to facilitate the selection of models in different scenarios, it is divided into four network structures of different sizes: s, m, l and X.

2. Network structure

Network structure recommendation reference In simple terms, complete explanation of Yolov5 core basic knowledge of Yolo series , there is a large high-definition picture, very detailed. Overall, compared with v4, the Focus layer is added, and then the activation function is adjusted. The overall network architecture is similar.

3. Loss function

Take the 5.0 code as an example. In loss, you can set fl_gamma to activate focal loss. In addition, IOU loss also supports GIOU, DIOU and CIOU.

def compute_loss(p, targets, model):  # predictions, targets, model
    device = targets.device
    lcls, lbox, lobj = torch.zeros(1, device=device), torch.zeros(1, device=device), torch.zeros(1, device=device)
    tcls, tbox, indices, anchors = build_targets(p, targets, model)  # targets
    h = model.hyp  # hyperparameters

    # Define criteria
    BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device))  # weight=model.class_weights)
    BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))

    # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
    cp, cn = smooth_BCE(eps=0.0)

    # Focal loss
    g = h['fl_gamma']  # focal loss gamma
    if g > 0:
        BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)

    # Losses
    nt = 0  # number of targets
    no = len(p)  # number of outputs
    balance = [4.0, 1.0, 0.3, 0.1, 0.03]  # P3-P7
    for i, pi in enumerate(p):  # layer index, layer predictions
        b, a, gj, gi = indices[i]  # image, anchor, gridy, gridx
        tobj = torch.zeros_like(pi[..., 0], device=device)  # target obj

        n = b.shape[0]  # number of targets
        if n:
            nt += n  # cumulative targets
            ps = pi[b, a, gj, gi]  # prediction subset corresponding to targets

            # Regression
            pxy = ps[:, :2].sigmoid() * 2. - 0.5
            pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]
            pbox = torch.cat((pxy, pwh), 1)  # predicted box
            iou = bbox_iou(pbox.T, tbox[i], x1y1x2y2=False, CIoU=True)  # iou(prediction, target)
            lbox += (1.0 - iou).mean()  # iou loss

            # Objectness
            tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * iou.detach().clamp(0).type(tobj.dtype)  # iou ratio

            # Classification
            if model.nc > 1:  # cls loss (only if multiple classes)
                t = torch.full_like(ps[:, 5:], cn, device=device)  # targets
                t[range(n), tcls[i]] = cp
                lcls += BCEcls(ps[:, 5:], t)  # BCE

            # Append targets to text file
            # with open('targets.txt', 'a') as file:
            #     [file.write('%11.5g ' * 4 % tuple(x) + '\n') for x in torch.cat((txy[i], twh[i]), 1)]

        lobj += BCEobj(pi[..., 4], tobj) * balance[i]  # obj loss

    s = 3 / no  # output count scaling
    lbox *= h['box'] * s
    lobj *= h['obj']
    lcls *= h['cls'] * s
    bs = tobj.shape[0]  # batch size

    loss = lbox + lobj + lcls
    return loss * bs, torch.cat((lbox, lobj, lcls, loss)).detach()

4. Post treatment

The post-processing code of YOLO v5 seems a little laborious. I sorted out the post-processing flow chart for reference.

5. Performance

YOLO v5 may not have advantages over v4 in accuracy, but it has many advantages in speed (training and reasoning, especially in the training stage). In addition, engineering ability is also an advantage.

reference resources

Keywords: Object Detection yolov5

Added by budz on Mon, 03 Jan 2022 17:21:53 +0200