100 cases of in-depth learning | day 53: train your own data set with YOLOv5 (super detailed full version)

Hello, I'm classmate K!

Let's move on to the last article 100 deep learning cases | day 51 - target detection algorithm (YOLOv5) (Introduction) After configuring the environment required by YOLOv5, today we try to train our data with YOLOv5. (remember to run through the introductory chapter before starting this tutorial to ensure that other environments are normal)

There's a picture and a truth. Let's take a look at my operation results yesterday

[YOLOv5 source address]

1, Prepare your own data

My directory structure is like this

  • home directory
    • paper_data (create a folder and put the data here)
      • Annotations (place our. xml file)
      • images (place picture file)
      • ImageSets
        • Main (four files train.txt, val.txt, test.txt and trainval.txt will be automatically generated in this folder to store the names of training set, verification set and test set pictures)

You will see the following directory structure:

The Annotations folder is an xml file. My files are as follows:

My images file is in. png format. The official format is. jpg, but it's not a big problem. Just change the code later (which will be explained later)

2, Run split_train_val.py file

There is a Main subfolder under the ImageSets folder, which stores four files: train.txt, val.txt, test.txt and trainval.txt through split_train_val.py file.

split_ train_ The location of val.py file is as follows:

split_ train_ The content of val.py is as follows:

# coding:utf-8

import os
import random
import argparse

parser = argparse.ArgumentParser()
#The address of the xml file is modified according to its own data. xml is generally stored under Annotations
parser.add_argument('--xml_path', default='Annotations', type=str, help='input xml label path')
#For the division of data sets, select ImageSets/Main under your own data for the address
parser.add_argument('--txt_path', default='ImageSets/Main', type=str, help='output txt label path')
opt = parser.parse_args()

trainval_percent = 1.0
train_percent = 0.9
xmlfilepath = opt.xml_path
txtsavepath = opt.txt_path
total_xml = os.listdir(xmlfilepath)
if not os.path.exists(txtsavepath):

num = len(total_xml)
list_index = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list_index, tv)
train = random.sample(trainval, tr)

file_trainval = open(txtsavepath + '/trainval.txt', 'w')
file_test = open(txtsavepath + '/test.txt', 'w')
file_train = open(txtsavepath + '/train.txt', 'w')
file_val = open(txtsavepath + '/val.txt', 'w')

for i in list_index:
    name = total_xml[i][:-4] + '\n'
    if i in trainval:
        if i in train:


Run split_ train_ After the val.py file, you will get four files: train.txt, val.txt, test.txt and trainval.txt. The results are as follows:

3, Generate train.txt, test.txt and val.txt files

Let's first look at the location of the file we want to generate

To get started, what we need now is voc_label.py file, its location is as follows:

voc_ The contents of the label.py file are as follows:

# -*- coding: utf-8 -*-
import xml.etree.ElementTree as ET
import os
from os import getcwd

sets = ['train', 'val', 'test']
classes = ["unripe citrus"]   # Change to your own category
abs_path = os.getcwd()

def convert(size, box):
    dw = 1. / (size[0])
    dh = 1. / (size[1])
    x = (box[0] + box[1]) / 2.0 - 1
    y = (box[2] + box[3]) / 2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return x, y, w, h

def convert_annotation(image_id):
    in_file = open('./Annotations/%s.xml' % (image_id), encoding='UTF-8')
    out_file = open('./labels/%s.txt' % (image_id), 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)
    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult) == 1:
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
        b1, b2, b3, b4 = b
        # Mark out of range correction
        if b2 > w:
            b2 = w
        if b4 > h:
            b4 = h
        b = (b1, b2, b3, b4)
        bb = convert((w, h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

wd = getcwd()
for image_set in sets:
    if not os.path.exists('./labels/'):
    image_ids = open('./ImageSets/Main/%s.txt' % (image_set)).read().strip().split()
    list_file = open('./%s.txt' % (image_set), 'w')
    for image_id in image_ids:
        list_file.write(abs_path + '/images/%s.png\n' % (image_id)) # Pay attention to your picture format. If it is. jpg, remember to modify it

Run voc_label.py file, you will get train.txt, test.txt and val.txt in the screenshot above. The contents of the file are as follows:

4, Create ab.yaml file

I took the file name at will. It can be changed

The location of the ab.yaml file is as follows:

My ab.yaml file reads as follows:

#path: ../datasets/coco  # dataset root dir
train: ./paper_data/train.txt  # train images (relative to 'path') 118287 images
val: ./paper_data/val.txt  # train images (relative to 'path') 5000 images
#test: test-dev2017.txt

nc: 1  # number of classes

names: ['unripe citrus'] # Change to your own category

5, A priori frame is obtained by clustering

First, we need to prepare the kmeans.py file, which is located in the main directory. Its contents are as follows:

import numpy as np

def iou(box, clusters):
    Calculates the Intersection over Union (IoU) between a box and k clusters.
    :param box: tuple or array, shifted to the origin (i. e. width and height)
    :param clusters: numpy array of shape (k, 2) where k is the number of clusters
    :return: numpy array of shape (k, 0) where k is the number of clusters
    x = np.minimum(clusters[:, 0], box[0])
    y = np.minimum(clusters[:, 1], box[1])
    if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:
        raise ValueError("Box has no area")                 # If this error is reported, you can change this line to pass

    intersection = x * y
    box_area = box[0] * box[1]
    cluster_area = clusters[:, 0] * clusters[:, 1]

    iou_ = intersection / (box_area + cluster_area - intersection)

    return iou_

def avg_iou(boxes, clusters):
    Calculates the average Intersection over Union (IoU) between a numpy array of boxes and k clusters.
    :param boxes: numpy array of shape (r, 2), where r is the number of rows
    :param clusters: numpy array of shape (k, 2) where k is the number of clusters
    :return: average IoU as a single float
    return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])

def translate_boxes(boxes):
    Translates all the boxes to the origin.
    :param boxes: numpy array of shape (r, 4)
    :return: numpy array of shape (r, 2)
    new_boxes = boxes.copy()
    for row in range(new_boxes.shape[0]):
        new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0])
        new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1])
    return np.delete(new_boxes, [0, 1], axis=1)

def kmeans(boxes, k, dist=np.median):
    Calculates k-means clustering with the Intersection over Union (IoU) metric.
    :param boxes: numpy array of shape (r, 2), where r is the number of rows
    :param k: number of clusters
    :param dist: distance function
    :return: numpy array of shape (k, 2)
    rows = boxes.shape[0]

    distances = np.empty((rows, k))
    last_clusters = np.zeros((rows,))


    # the Forgy method will fail if the whole array contains the same rows
    clusters = boxes[np.random.choice(rows, k, replace=False)]

    while True:
        for row in range(rows):
            distances[row] = 1 - iou(boxes[row], clusters)

        nearest_clusters = np.argmin(distances, axis=1)

        if (last_clusters == nearest_clusters).all():

        for cluster in range(k):
            clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)

        last_clusters = nearest_clusters

    return clusters

if __name__ == '__main__':
    a = np.array([[1, 2, 3, 4], [5, 7, 6, 8]])

Then, you need to prepare another one named claim_ Anchors.py, which is also located in the home directory. Its contents are as follows:

# -*- coding: utf-8 -*-
# Find a priori box according to label file

import os
import numpy as np
import xml.etree.cElementTree as et
from kmeans import kmeans, avg_iou

FILE_ROOT = "./paper_data/"     # Root path
ANNOTATION_ROOT = "Annotations"  # Dataset label folder path


ANCHORS_TXT_PATH = "./data/anchors.txt"

CLUSTERS = 1  # The number of categories should be modified
CLASS_NAMES = ['unripe citrus'] #Modify to your own category

def load_data(anno_dir, class_names):
    xml_names = os.listdir(anno_dir)
    boxes = []
    for xml_name in xml_names:

        xml_pth = os.path.join(anno_dir, xml_name)
        tree = et.parse(xml_pth)

        width = float(tree.findtext("./size/width"))
        height = float(tree.findtext("./size/height"))

        for obj in tree.findall("./object"):
            cls_name = obj.findtext("name")
            if cls_name in class_names:
                xmin = float(obj.findtext("bndbox/xmin")) / width
                ymin = float(obj.findtext("bndbox/ymin")) / height
                xmax = float(obj.findtext("bndbox/xmax")) / width
                ymax = float(obj.findtext("bndbox/ymax")) / height

                box = [xmax - xmin, ymax - ymin]
    return np.array(boxes)

if __name__ == '__main__':

    anchors_txt = open(ANCHORS_TXT_PATH, "w")

    train_boxes = load_data(ANNOTATION_PATH, CLASS_NAMES)
    count = 1
    best_accuracy = 0
    best_anchors = []
    best_ratios = []

    for i in range(10):      ##### It can be modified, not too large, otherwise it will take a long time
        anchors_tmp = []
        clusters = kmeans(train_boxes, k=CLUSTERS)
        idx = clusters[:, 0].argsort()
        clusters = clusters[idx]
        # print(clusters)

        for j in range(CLUSTERS):
            anchor = [round(clusters[j][0] * 640, 2), round(clusters[j][1] * 640, 2)]

        temp_accuracy = avg_iou(train_boxes, clusters) * 100

        ratios = np.around(clusters[:, 0] / clusters[:, 1], decimals=2).tolist()
        print(20 * "*" + " {} ".format(count) + 20 * "*")

        count += 1

        if temp_accuracy > best_accuracy:
            best_accuracy = temp_accuracy
            best_anchors = anchors_tmp
            best_ratios = ratios

    anchors_txt.write("Best Accuracy = " + str(round(best_accuracy, 2)) + '%' + "\r\n")
    anchors_txt.write("Best Anchors = " + str(best_anchors) + "\r\n")
    anchors_txt.write("Best Ratios = " + str(best_ratios))

Run claim_ Anchors.py file, we will get anchors.txt file

6, Start training the model with your own dataset

Enter command:

python train.py --img 900 --batch 2 --epoch 100 --data data/ab.yaml --cfg models/yolov5s.yaml --weights weights/yolov5s.pt --device '0'

We can directly train our own data set. My final running results are as follows:

If you still have problems that can't be solved, you can add my wechat (mtyjkh#) or leave a message directly below. After seeing it, you will reply to you as soon as possible.

Keywords: Computer Vision Deep Learning Object Detection yolov5

Added by cauri on Thu, 04 Nov 2021 05:10:50 +0200