Explore YOLO v3 source code - conclusion prediction

Explore YOLO v3 source code - conclusion prediction

Updated:

Part 1 training:

https://mp.weixin.qq.com/s/T9LshbXoervdJDBuP564dQ

Part 2 model:

https://mp.weixin.qq.com/s/N79S9Qf1OgKsQ0VU5QvuHg

Part 3 network:

https://mp.weixin.qq.com/s/hC4P7iRGv5JSvvPe-ri_8g

Chapter 4 truth value:

https://mp.weixin.qq.com/s/5Sj7QadfVvx-5W9Cr4d3Yw

Part 5 Loss:

https://mp.weixin.qq.com/s/4L9E4WGSh0hzlD303036bQ

1. Detection function

Use the trained YOLO v3 model to detect the objects in the picture, where:

  • Create an instance Yolo of the Yolo class;
  • Use Image.open() to load the image;
  • Call yolo.detect_image() detects an image;
  • Close yolo's session;
  • Display the detected image r_image;

realization:

def detect_img_for_test():
    yolo = YOLO()
    img_path = './dataset/img.jpg'
    image = Image.open(img_path)
    r_image = yolo.detect_image(image)
    yolo.close_session()
    r_image.show()

Output:

2. YOLO parameters

Initialization parameters of YOLO class:

  • anchors_path: configuration file of anchor box, 9 width height combinations;
  • model_path: the model that has been trained and supports retraining;
  • classes_path: category file, matching with model file;
  • score: the threshold of confidence, and delete the candidate box less than the threshold;
  • IOU: IOU threshold of candidate box, and delete candidate boxes in the same category that are greater than the threshold;
  • class_names: category list, read classes_path;
  • Anchors: anchor box list, read anchors_path;
  • model_image_size: the size of the image detected by the model. All input images need to be filled according to this;
  • colors: generate random color sets through HSV color gamut, and the number is equal to the number of categories_ names;
  • boxes, scores, classes: the core output of the detection, generated by the function generate(), is the output package of the model.

realization:

self.anchors_path = 'configs/yolo_anchors.txt'  # Anchors
self.model_path = 'model_data/yolo_weights.h5'  # Model file
self.classes_path = 'configs/coco_classes.txt'  # Category file

self.score = 0.20
self.iou = 0.20
self.class_names = self._get_class()  # Get category
self.anchors = self._get_anchors()  # Get anchor
self.sess = K.get_session()
self.model_image_size = (416, 416)  # fixed size or (None, None), hw
self.colors = self.__get_colors(self.class_names)
self.boxes, self.scores, self.classes = self.generate()

In__ get_ In colors():

  • Divide the H value of bit 0 of HSV by 1, and the other SV values are 1 to generate a group of HSV list;
  • Call hsv_to_rgb, convert HSV color gamut to RGB color gamut;
  • The RGB value of 01 is multiplied by 255 to convert to a complete color value, (0255);
  • Random shuffle color list;

realization:

@staticmethod def __get_colors(names):
    # Different boxes, different colors
    hsv_tuples = [(float(x) / len(names), 1., 1.)
                  for x in range(len(names))]  # Different colors
    colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
    colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors))  # RGB
    np.random.seed(10101)
    np.random.shuffle(colors)
    np.random.seed(None)

    return colors

The reason why HSV division is selected instead of RGB is that the color value offset of HSV is better, and the color of the drawn box is easier to distinguish.

3. Output package

boxes, scores and classes continue to be encapsulated on the basis of the model and are generated by the function generate(), where:

  • boxes: coordinates of the four points of the box (top, left, bottom, right);
  • scores: category confidence of box, fusion box confidence and category confidence;
  • classes: the category of the box;

In the function generate(), set the parameters:

  • num_anchors: the total number of anchor box es, usually 9;
  • num_classes: the total number of classes. For example, COCO is 80 classes;
  • yolo_model: by Yolo_ For the model created by body, call load_weights loading parameters;

realization:

num_anchors = len(self.anchors)  # Number of anchors
num_classes = len(self.class_names)  # Number of categories

self.yolo_model = yolo_body(Input(shape=(416, 416, 3)), 3, num_classes)
self.yolo_model.load_weights(model_path)  # Load model parameters

Next, set input_image_shape is placeholder, that is, the parameter variable in TF. In Yolo_ In Eval:

  • Continue to encapsulate Yolo_ output of model;
  • anchors, anchor box list;
  • Category class_ Total number of names len();
  • Enter the optional size of the picture, input_image_shape, i.e. (416, 416);
  • score_threshold, the overall confidence threshold score of the box;
  • iou_threshold, IOU threshold IOU of the same category box;
  • Returns the coordinate boxes of the box, the category confidence scores of the box, and the category classes of the box;

realization:

self.input_image_shape = K.placeholder(shape=(2,))
boxes, scores, classes = yolo_eval(
    self.yolo_model.output, self.anchors, len(self.class_names),
    self.input_image_shape, score_threshold=self.score, iou_threshold=self.iou)
return boxes, scores, classes

The output score value will be greater than score_threshold, less than yolo_eval() has been deleted.

4. YOLO assessment

In function yolo_eval(), complete the encapsulation of prediction logic, where the input is:

  • yolo_outputs: the output of YOLO model, a list of three scales, i.e. 13-26-52. The last dimension is the predicted value, which is composed of 255=3x(5+80). 3 is the anchor number of each layer, 5 is the four box values xywh and the confidence of objects contained in one box, and 80 is the number of categories of COCO;
  • anchors: the value of 9 anchor box es;
  • num_classes: number of categories; COCO is 80 categories;
  • image_shape: TF parameter of placeholder type, default (416, 416);
  • max_boxes: the maximum number of detection frames in the figure, 20;
  • score_threshold: the confidence threshold of the box. If the box is smaller than the threshold, it will be deleted. If more boxes are needed, the threshold will be lowered. If fewer boxes are needed, the threshold will be raised;
  • iou_threshold: the IoU threshold of the same category box. The overlapping box greater than the threshold will be deleted. If there are many overlapping objects, the threshold will be increased. If there are few overlapping objects, the threshold will be reduced;

Among them, Yolo_ The format of outputs is as follows:

[(?, 13, 13, 255), (?, 26, 26, 255), (?, 52, 52, 255)]

The anchors list is as follows:

[(10,13), (16,30), (33,23), (30,61), (62,45), (59,119), (116,90), (156,198), (373,326)]

realization:

boxes, scores, classes = yolo_eval(
    self.yolo_model.output, self.anchors, len(self.class_names),
    self.input_image_shape, score_threshold=self.score, iou_threshold=self.iou)

def yolo_eval(yolo_outputs, anchors, num_classes, image_shape,
              max_boxes=20, score_threshold=.6, iou_threshold=.5):

Next, process parameters:

  • num_layers, the number of layers of the output feature map, 3 layers;
  • anchor_mask, which divides anchors into three layers, the first layer 13x13 is 678, the second layer 26x26 is 345, and the third layer 52x52 is 012;
  • input_shape: the size of the input image, that is, the size of the 0th feature image multiplied by 32, that is, 13x32=416, which is related to the network structure of Darknet.
num_layers = len(yolo_outputs)
anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]]  # default setting
input_shape = K.shape(yolo_outputs[0])[1:3] * 32

The larger the feature map, 13 - > 52, the smaller the detected object and the smaller the anchors required, so the anchors list is assigned in reverse order.

Next, YOLO is output at layer l of YOLO_ In outputs, call YOLO_. boxes_ and_ Scores(), extract box_ Boxes and confidence_ box_scores, put the box data of three layers into the list boxes and boxes_ Score, then splice concatenate and flatten, and the output data is all boxes and confidence.

Where, the output boxes and boxes_ The format of scores is as follows:

boxes: (?, 4)  # ? Is the number of frames
box_scores: (?, 80)

realization:

boxes = []
box_scores = []
for l in range(num_layers):
    _boxes, _box_scores = yolo_boxes_and_scores(
        yolo_outputs[l], anchors[anchor_mask[l]], num_classes, input_shape, image_shape)
    boxes.append(_boxes)
    box_scores.append(_box_scores)
boxes = K.concatenate(boxes, axis=0)
box_scores = K.concatenate(box_scores, axis=0)

concatenate is used to flatten the data of multiple layers, because the box has been restored to real coordinates, and there is no difference in different scales.

In function Yolo_ boxes_ and_ In scores():

  • yolo_ Output of head: box_xy is the central coordinate of box, (01) relative position; box_wh is the width and height of box, (01) relative value; box_confidence is the confidence of the object in the box; box_class_probs is category confidence;
  • yolo_correct_boxes, box_xy and box_ The (0 ~ 1) relative value of wh is converted into real coordinates, and the output boxes are the values of (y_min,x_min,y_max,x_max);
  • reshape, flatten the values of different meshes into a list of boxes, that is (?, 13,13,3,4) - > (?, 4);
  • box_scores is the product of frame confidence and category confidence, and then reshape flattens, (?, 80);
  • Return box boxes and box confidence box_scores.

realization:

def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape):
    '''Process Conv layer output'''
    box_xy, box_wh, box_confidence, box_class_probs = yolo_head(
        feats, anchors, num_classes, input_shape)
    boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape)
    boxes = K.reshape(boxes, [-1, 4])
    box_scores = box_confidence * box_class_probs
    box_scores = K.reshape(box_scores, [-1, num_classes])
    return boxes, box_scores

next:

  • Mask, filter the boxes less than the confidence threshold, and only keep the boxes greater than the confidence, mask;
  • max_boxes_tensor, maximum number of detection frames per picture, max_boxes is 20;

realization:

mask = box_scores >= score_threshold
max_boxes_tensor = K.constant(max_boxes, dtype='int32')

next:

  • Filter the box class by mask and category c_ Boxes and confidence class_box_scores;
  • Through NMS and non maximum suppression, the NMS index NMS of box boxes is filtered out_ index;
  • According to the index, select the box of gather output class_boxes and classes_ box_ Score, and then generate category information classes;
  • Combine the data of multiple categories to generate the final detection data frame and return it.

realization:

boxes_ = []
scores_ = []
classes_ = []
for c in range(num_classes):
    class_boxes = tf.boolean_mask(boxes, mask[:, c])
    class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c])
    nms_index = tf.image.non_max_suppression(
        class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold)
    class_boxes = K.gather(class_boxes, nms_index)
    class_box_scores = K.gather(class_box_scores, nms_index)
    classes = K.ones_like(class_box_scores, 'int32') * c
    boxes_.append(class_boxes)
    scores_.append(class_box_scores)
    classes_.append(classes)
boxes_ = K.concatenate(boxes_, axis=0)
scores_ = K.concatenate(scores_, axis=0)
classes_ = K.concatenate(classes_, axis=0)

Output format:

boxes_: (?, 4)
scores_: (?,)
classes_: (?,)

5. Test method

Step 1, image processing:

  • Convert the equal proportion of the image into the detection size, the detection size needs to be a multiple of 32, and the surrounding is filled;
  • Add 1 dimension to the picture to conform to the input parameter format;
if self.model_image_size != (None, None):  # 416x416, 416=32*13, which must be a multiple of 32. The minimum scale is divided by 32
    assert self.model_image_size[0] % 32 == 0, 'Multiples of 32 required'
    assert self.model_image_size[1] % 32 == 0, 'Multiples of 32 required'
    boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size)))  # Fill image
else:
    new_image_size = (image.width - (image.width % 32), image.height - (image.height % 32))
    boxed_image = letterbox_image(image, new_image_size)
image_data = np.array(boxed_image, dtype='float32')
print('detector size {}'.format(image_data.shape))
image_data /= 255.  # Convert 0 ~ 1
image_data = np.expand_dims(image_data, 0)  # Add batch dimension and add 1 dimension to the picture

Step 2, feed data, image, image size;

out_boxes, out_scores, out_classes = self.sess.run(
    [self.boxes, self.scores, self.classes],
    feed_dict={
        self.yolo_model.input: image_data,
        self.input_image_shape: [image.size[1], image.size[0]],
        K.learning_phase(): 0
    })

Step 3, draw the border, automatically set the border width, draw the border and category text, and use the pilot drawing library.

font = ImageFont.truetype(font='font/FiraMono-Medium.otf',
                          size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))  # typeface
thickness = (image.size[0] + image.size[1]) // 512 # THK
for i, c in reversed(list(enumerate(out_classes))):
    predicted_class = self.class_names[c]  # category
    box = out_boxes[i]  # frame
    score = out_scores[i]  # Degree of implementation

    label = '{} {:.2f}'.format(predicted_class, score)  # label
    draw = ImageDraw.Draw(image)  # Drawing
    label_size = draw.textsize(label, font)  # Label text

    top, left, bottom, right = box
    top = max(0, np.floor(top + 0.5).astype('int32'))
    left = max(0, np.floor(left + 0.5).astype('int32'))
    bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
    right = min(image.size[0], np.floor(right + 0.5).astype('int32'))
    print(label, (left, top), (right, bottom))  # frame

    if top - label_size[1] >= 0:  # Label text
        text_origin = np.array([left, top - label_size[1]])
    else:
        text_origin = np.array([left, top + 1])

    # My kingdom for a good redistributable image drawing library.
    for i in range(thickness):  # Picture frame
        draw.rectangle(
            [left + i, top + i, right - i, bottom - i],
            outline=self.colors[c])
    draw.rectangle(  # Text background
        [tuple(text_origin), tuple(text_origin + label_size)],
        fill=self.colors[c])
    draw.text(text_origin, label, fill=(0, 0, 0), font=font)  # Copywriting
    del draw

supplement

1. concatenate

concatenate connects data elements of the same dimension.

realization:

from keras import backend as K

sess = K.get_session()

a = K.constant([[2, 4], [1, 2]])
b = K.constant([[3, 2], [5, 6]])
c = [a, b]
c = K.concatenate(c, axis=0)

print(sess.run(c))
"""
[[2. 4.] [1. 2.] [3. 2.] [5. 6.]]
"""

2. gather

gather selects list elements by index.

realization:

from keras import backend as K

sess = K.get_session()

a = K.constant([[2, 4], [1, 2], [5, 6]])
b = K.gather(a, [1, 2])

print(sess.run(b))
"""
[[1. 2.] [5. 6.]]
"""

reference

@online{SpikeKing2021Nov,
author = {SpikeKing},
title = {{explore YOLO v3 source code - conclusion forecast}},
organization = {wechat public platform},
year = {2021},
month = {11},
date = {2021-11-22},
urldate = {2021-11-22},
note = {[Online; accessed 22. Nov. 2021]},
url = {https://mp.weixin.qq.com/s/J1ddmUvT_F2HcljLtg_uWQ},
abstract = {{this article mainly shares the details of how to implement the algorithm of YOLO v3, the Keras framework. This is the sixth article, detecting objects in pictures, using the trained model, and screening the optimal detection frame through the product of frame confidence and category confidence.}}
}

Keywords: Python Machine Learning AI

Added by jtrost on Mon, 22 Nov 2021 13:35:04 +0200