Train yolov4 tiny on Ubuntu 20.04

Train yolov4 tiny on Ubuntu 20.04

1, Data download

1.yolov4

Official download: https://github.com/AlexeyAB/darknet
Network disk download link: https://pan.baidu.com/s/1HYiCANZZ4NPYFvMJ-cenFA
Extraction code: 2rh0

2.yolov4-tiny.weights

Official download: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights
Network disk download link: https://pan.baidu.com/s/1Rui1GNNyXQeEvz-dNdmbQA
Extraction code: jem4

3.yolov4-tiny.conv.29

Official download address: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29
Network disk download link: https://pan.baidu.com/s/15jyob_g98I951bbrBRDyIA
Extraction code: 1ey6

2, Training

1. Configure Makefile file

(1) Modify Makefile file

Modify as follows:

(2) Compile

In the yolov4 directory, open the terminal for compilation

make	#The computer CPU is single core
 perhaps
make -j16	#My computer is a multi-core CPU, so I added the parameter - j16 here. Of course, you can try - 4 -j8 according to your computer configuration. Of course, the larger the value after J, the faster the compilation

After running, you can see the generated darknet executable file in the yolov4 directory

2. Dataset training file configuration

(1) Storing data sets

Create the following folder structure under yolov4 directory

---VOCdevkit
	---VOC2007
		---Annotations		#xml file for own dataset
		---ImageSets		#Create a new folder to store subsequent txt files
			---Main
		---JPEGImages		#Own data set of picture files, I am jpg format

(2) train, trainval, test and val files are generated in Main

To generate train, trainval, test and val files in the newly created Main folder, first create a python file under the folder voc207 (the file name has no special requirements). The specific code is as follows:

import os
import random

trainval_percent = 0.9  #You can modify it yourself
train_percent = 0.9     #You can modify it yourself
xmlfilepath = 'Annotations'
txtsavepath = 'ImageSets\Main'
total_xml = os.listdir(xmlfilepath)

num=len(total_xml)
list=range(num)
tv=int(num*trainval_percent)
tr=int(tv*train_percent)
trainval= random.sample(list,tv)
train=random.sample(trainval,tr)

ftrainval = open('ImageSets/Main/trainval.txt', 'w')
ftest = open('ImageSets/Main/test.txt', 'w')
ftrain = open('ImageSets/Main/train.txt', 'w')
fval = open('ImageSets/Main/val.txt', 'w')

for i  in list:
    name=total_xml[i][:-4]+'\n'
    if i in trainval:
        ftrainval.write(name)
        if i in train:
            ftrain.write(name)
        else:
            fval.write(name)
    else:
        ftest.write(name)

ftrainval.close()
ftrain.close()
fval.close()
ftest.close()

Then open the terminal under the VOC2007 folder and run the file:

python3 label.py	#label.py is my custom name

Under ImageSets/Main /, you can see the generated four train.txt, trainval.txt, test.txt and val.txt files.

(3) Generate txt files for training in yolov4 directory

Create VOC in yolov4 directory_ Label.py file (or you can find it under scripts, but you need to modify it). The specific code is as follows:

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join

sets=[('2007', 'train'), ('2007', 'val'), ('2007', 'test')]	#Remove the relevant contents of 2012

classes = ["person"]		#Change to the name of your own class


def convert(size, box):
    dw = 1./(size[0])
    dh = 1./(size[1])
    x = (box[0] + box[1])/2.0 - 1
    y = (box[2] + box[3])/2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

def convert_annotation(year, image_id):
    in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year, image_id))
    out_file = open('VOCdevkit/VOC%s/labels/%s.txt'%(year, image_id), 'w')
    tree=ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)

    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult)==1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')

wd = getcwd()

for year, image_set in sets:
    if not os.path.exists('VOCdevkit/VOC%s/labels/'%(year)):
        os.makedirs('VOCdevkit/VOC%s/labels/'%(year))
    image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year, image_set)).read().strip().split()
    list_file = open('%s_%s.txt'%(year, image_set), 'w')
    for image_id in image_ids:
        list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg\n'%(wd, year, image_id))
        convert_annotation(year, image_id)
    list_file.close()

os.system("cat 2007_train.txt 2007_val.txt > train.txt")	#Remove the relevant contents of 2012
os.system("cat 2007_train.txt 2007_val.txt 2007_test.txt > train.all.txt")	#Remove the relevant contents of 2012

Run the file:

python3 voc_label.py

In the yolov4 directory, you can see that 2007 has been generated_ test.txt,2007_train.txt and 2007_val.txt file.

3. Modify cfg file

Find yolov4-tiny.cfg file in CFG folder and modify it as follows:

[net]
# Testing	#Test mode, on during test
#batch=1	
#subdivisions=1	
# Training	#Training mode, on during training, notes during testing
batch=256	# The quantity of each batch. According to the configuration settings, if the memory is small, reduce batch and subdivisions. The larger the batch and subdivisions, the better the effect
subdivisions=16	
width=416
height=416
channels=3	# The length and width of the input image width height channels are set to a multiple of 32, because the down sampling parameter is 32, the minimum is 320 * 320 and the maximum is 608 * 608
momentum=0.9	# Momentum parameter, influence gradient descent velocity
decay=0.0005	# The weight attenuates the regular term to prevent over fitting
angle=0	# rotate
saturation = 1.5	# Saturation amplification
exposure = 1.5	# Exposure
hue=.1	# tone

learning_rate=0.00261	# Learning rate, weight update speed
burn_in=1000	# The number of iterations is less than burn_in, learning rate update; Greater than burn_in, update with policy
max_batches = 500200	# Training reaches max_batches stops. The specific value is classes*2000, but the minimum value cannot be less than the number of pictures in your dataset and the minimum value cannot be less than 6000
policy=steps	# Learning rate adjustment policy: constant, steps, exp, policy, step, SIG, random
steps=400000,450000	# The step value is max_ 80% and 90% of batches
scales=.1,.1	# Proportion of change in learning rate

[convolutional]
batch_normalize=1
filters=32
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[route]
layers=-1
groups=2
group_id=1

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[route]
layers = -1,-2

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -6,-1

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

##################################

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=24	# 3*(classes+5)
activation=linear

[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=1	# The number of categories is modified to its own category number. I have only one category here, so it is 1
num=6
jitter=.3
scale_x_y = 1.05
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
ignore_thresh = .7
truth_thresh = 1
random=0
resize=1.5
nms_kind=greedynms
beta_nms=0.6

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 23

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
**filters=24**	# 3*(5+classes)
activation=linear

[yolo]
mask = 1,2,3
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319	#The initial width and height of the prediction frame, the first is w, the second is h, and the total number is num*2
classes=1	# Number of categories, modified to your own category number
num=6	# Number of boundingboxes predicted by each grid
jitter=.3	# # Use data jitter to generate more data to suppress overfitting. In YOLOv2, cross, filp, and the angle of the net layer are used. flip is random. Cross is the parameter of jitter. jitter=.2 in tiny-yolo-voc.cfg is to perform cross in 0 ~ 0.2
scale_x_y = 1.05
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
ignore_thresh = .7	# # The parameter that determines whether the IOU error needs to be calculated. If it is greater than thresh, the IOU error will not be caught in the cost function
truth_thresh = 1
random=0	#  If it is 1, the image size of each iteration is randomly from 320 to 608, and the step size is 32. If it is 0, the size of each training is consistent with the input size
resize=1.5
nms_kind=greedynms
beta_nms=0.6

4. Modify voc.names

Modify voc.names under data and fill in according to your actual category name. I have only one person category here. Note that it must be consistent with the category in your label file!

person

5. Modify voc.data

Modify voc.data under cfg. Note: if the following files or folders cannot be found during subsequent training, you can try to use the root path.

classes= 1	# Number of categories
train  = /home/yourselfpath/darknet-master/2007_train.txt	# Training set path
valid  = /home/yourselfpath/darknet-master/2007_test.txt	# Test set path
names = data/voc.names	
backup = backup/	# Model save path

6. Training

Open the terminal input under yolov4: (pay attention to check whether the files in the following instructions are placed in the corresponding positions! The following yolov4.conv.29 has provided the download method at the beginning, remember to put them under yolov4)

./darknet detector train data/voc.data cfg/yolov4-tiny.cfg yolov4.conv.29 -map

Multi GPU training:

./darknet detector train data/voc.data cfg/yolov4-tiny.cfg yolov4.conv.29 -map -gpus 0,1

7. Test

Select the weight file with the most parameters to test

(1) Picture test

./darknet detector test data/voc.data cfg/yolov4-tiny.cfg backup/yolov4-tiny-custom_last.weights data/person.jpg

(2) Video test

./darknet detector demo data/voc.data cfg/yolov4-tiny.cfg backup/yolov4-tiny-custom_last.weights data/1.mp4

(3) Camera test

./darknet detector demo data/voc.data cfg/yolov4-tiny.cfg backup/yolov4-tiny-custom_last.weights -c 0 #If there are multiple cameras, they can be replaced by switching -c the following parameters

3, Reference

1.https://github.com/AlexeyAB/darknet

2.https://blog.csdn.net/u010881576/article/details/107053328

Keywords: Ubuntu Deep Learning Object Detection YOLOv4

Added by soulroll on Thu, 28 Oct 2021 11:03:31 +0300