yolov5 train your own dataset

The position of yolo Series in the field of target detection goes without saying. There is a code for training yolov5 implemented by pytorch on github. This paper will use its own data to train a yolov5 model. Reference code address

https://github.com/ultralytics/yolov5/tags

Note that here we select v1.0 under tags 0 version for training, different version codes are different, and the reproduction process is also different. Where v1 The training process of version 0 is very similar to yolov3

1, Prepare your own data

labelImg software is used to label the data, and all the data saved are Annotation file in xml format. This time, about 8k pictures are used to train 8 categories (no screenshots for the sample)

2, Training process

2.1 separating data sets

First, create a new folder under yolov5-1.0 folder and put your own picture data and label data in it, as shown below:

Among them, images is the original picture, Annotations is the annotation file, and the serial number corresponds to that in images one by one.

Then create a new split_ train_ The val.py file divides the training set and the verification set, and generates three files with saved file names, which are train. Under ImageSets txt val.txt trainval.txt test.txt

# coding:utf-8

import os
import random
import argparse

parser = argparse.ArgumentParser()
# The address of the xml file is modified according to its own data. xml is generally stored under Annotations
parser.add_argument('--xml_path', default='zjyn_data/Annotations', type=str, help='input xml label path')
# For the division of data sets, select ImageSets/Main under your own data for the address
parser.add_argument('--txt_path', default='zjyn_data/ImageSets', type=str, help='output txt label path')
opt = parser.parse_args()

trainval_percent = 1.0
train_percent = 0.9
xmlfilepath = opt.xml_path
txtsavepath = opt.txt_path
total_xml = os.listdir(xmlfilepath)
if not os.path.exists(txtsavepath):
    os.makedirs(txtsavepath)

num = len(total_xml)
list_index = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list_index, tv)
train = random.sample(trainval, tr)

file_trainval = open(txtsavepath + '/trainval.txt', 'w')
file_test = open(txtsavepath + '/test.txt', 'w')
file_train = open(txtsavepath + '/train.txt', 'w')
file_val = open(txtsavepath + '/val.txt', 'w')

for i in list_index:
    name = total_xml[i][:-4] + '\n'
    if i in trainval:
        file_trainval.write(name)
        if i in train:
            file_train.write(name)
        else:
            file_val.write(name)
    else:
        file_test.write(name)

file_trainval.close()
file_train.close()
file_val.close()
file_test.close()

The role of this script is to separate the training set and verification set according to the ratio of 9:1, and then generate the corresponding txt file.

2.2 convert xml annotation file to txt

New XML_ 2_ txt. The PY script is as follows:

# -*- coding: utf-8 -*-
import xml.etree.ElementTree as ET
import os
from os import getcwd

sets = ['train', 'val', 'test']

# Change to your own category
classes = ["person", "sport_ball", "bar", "ruler", "cursor", "blanket", "marker_post", "mark_barrels"]
abs_path = os.getcwd()
print(abs_path)


def convert(size, box):
    dw = 1. / (size[0])
    dh = 1. / (size[1])
    x = (box[0] + box[1]) / 2.0 - 1
    y = (box[2] + box[3]) / 2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return x, y, w, h


def convert_annotation(image_id):
    in_file = open('zjyn_data/Annotations/%s.xml' % (image_id), encoding='UTF-8')
    out_file = open('zjyn_data/labels/%s.txt' % (image_id), 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)
    for obj in root.iter('object'):
        # difficult = obj.find('difficult').text
        difficult = obj.find('Difficult')
        if difficult is None:
            difficult = obj.find('difficult')
        difficult = difficult.text
        cls = obj.find('name').text
        if cls not in classes or int(difficult) == 1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
             float(xmlbox.find('ymax').text))
        b1, b2, b3, b4 = b
        # Mark out of range correction
        if b2 > w:
            b2 = w
        if b4 > h:
            b4 = h
        b = (b1, b2, b3, b4)
        bb = convert((w, h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')


wd = getcwd()
print(wd)

for image_set in sets:
    if not os.path.exists('zjyn_data/labels/'):
        os.makedirs('zjyn_data/labels/')
    image_ids = open('zjyn_data/ImageSets/%s.txt' % (image_set)).read().strip().split()
    list_file = open('zjyn_data/%s.txt' % (image_set), 'w')
    for image_id in image_ids:
        list_file.write(abs_path + '/zjyn_data/images/%s.jpg\n' % (image_id))
        convert_annotation(image_id)
    list_file.close()

Here are some points:

The classes list can be rewritten into its own category, which should be consistent with data / * Category correspondence of yaml file

The script is based on the previously generated train Txt and val.txt, find the corresponding label file, and then save it in the form of TXT. The specific storage form is:

At the same time, the absolute path train of the corresponding picture is generated in the zjyn folder txt, val.txt

2.3 start training

Train under the project Py is a training file.

Set some super parameters under the main function. Just pay attention to a few places.

batch_size is set according to the configuration of your own graphics card. The larger the better. I use 2080Ti in batch_ When the size is 32, the video memory is almost full. If it is larger, it will report out_of_momery error.

--cfg this parameter is the network structure file of the model

The four files are the hyperparameter definition files of the network model, representing different network structures, of which yolov5s is the simplest, and the others are deepened and widened on this basis.

Pay attention to making corresponding modifications. For example, if you use yolov5s, you need to modify yolov5s Yaml file

#Change nc to its own category number
nc = 8

-- data this parameter indicates the yaml file path of the data, as follows:

# train and val datasets (image directory or *.txt file with image paths)
train: /home/elvis/project2021/yolov5-1.0-copy/zjyn_data/train.txt  # 6648 images
val: /home/elvis/project2021/yolov5-1.0-copy/zjyn_data/val.txt  # 738 images   train+val = 7387
test: /home/elvis/project2021/yolov5-1.0-copy/zjyn_data/test.txt

# number of classes
nc: 8

# class names
names: ["person", "sport_ball", "bar", "ruler", "cursor", "blanket", "marker_post", "mark_barrels"]

names corresponds to your own category list

image_size parameter. This indicates the input size of the network. Theoretically, the larger the network input, the better the recognition effect for small targets. The overall recognition effect is also better

--weights pre training model file. Optional for initialization. If there is a pre training model, the loss function will converge faster.

3, Training results

If your own data samples are unbalanced, for example, there are many people in my data and few other categories. Because the source code is the anchor selected by automatic clustering of data, the trained map is very low. So I cancel the clustering and select anchor, and use the default anchor instead. The effect is much better. In train Just annotate the check anchors in the PY file.

The final 8k data was trained by 300 epoch s and lasted 7 hours on 2080Ti. The training result was map = 0.65

Keywords: AI Pytorch Computer Vision Deep Learning Object Detection

Added by Sakesaru on Fri, 21 Jan 2022 10:43:30 +0200

Programming VIP