SSD pytorch model trains its own data set

1. Download SSD pytorch code

SSD pytorch code link:

git clone
  1. Run the code and download it locally (if the download is too slow, you can upload it to the code cloud, and then git clone the code cloud address)

2. Prepare data sets

  1. Students without datasets can download the VOC and COCO datasets of the code (. / data/scripts directory)

  1. If you have your own dataset, please place the dataset in the. / data directory, such as VOC format dataset, and create a new VOCdevkit folder, as shown in the figure below. You can refer to:
  2. Place all labels in Annotations, all pictures in JPEGimages, and train.txt/val.txt/test.txt in ImagesSets/Main (the content is only the name of the picture, such as 0000110002, without suffix jpg or png), which can be generated by the script itself:

3. Modify the code according to your own data set

  1. For the data set in VOC format used by bloggers, the following modifications take VOC format as an example


  1. Locate the file,

  2. Open and modify num in VOC_ Classes, modify according to your own situation: classes+1 (background is a class),

  3. I have only one category here. All are 2

  4. It's best to modify max for the first debugging_ ITER, otherwise, the number of iterations is too large and takes a long time. Other parameters are super parameters that can take up time and not be modified

For the data set in VOC format used by bloggers, the following modifications take VOC format as an example

  1. Modify according to your own label. There is only one type of blogger here, so there is only one dargon fruit (Note: if there is only one type, you need to add [])

  1. image_ Modify it in sets. Modify it according to your own data set. I only have train and val here

Download the pre training model. VGG16_reducedfc.pth
Link: password: nw6t
– sharing from Baidu online disk super member V1

  1. Modify batch according to your own video memory_ Size, it is suggested to modify it a little smaller at the beginning. Blogger 1660ti 6G video memory

  1. Lower the parameters of the saved training model to 1000 previously set by iter and 500 here, and then set them according to your own situation
  2. By the way, the saved model name can also be modified later. Change COCO to VOC. The blogger has not modified it here

Add the trained model to to verify the model. I trained ssd300 here_ VOC_ 500.pth
Put the following

args = parser.parse_args()

Change to

args,unknow= parser.parse_known_args()


  1. Modify num_classes, just the same as in above
  2. After the modification, run After the training, the blogger runs to verify the training model. The AP is only 63%, which may be because there are too few blogger data sets

Running can only see the AP value. To test your image, run demo.ipynb in jupyter notebook

Modify the code of the corresponding part to the following, and pay attention to the correct path of adding pictures

image = cv2.imread('.../data/example3.jpg', cv2.IMREAD_COLOR) # uncomment if dataset not downloaded
from matplotlib import pyplot as plt
from data import VOCDetection, VOC_ROOT, VOCAnnotationTransform

here we specify year (07 or 12) and dataset ('test', 'val', 'train')

#testset = VOCDetection('./data/example1.jpg', [('2020', 'val')], None, VOCAnnotationTransform())
#img_id = 13
#image = testset.pull_image(img_id)
rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

Possible problems:

bug1: the dimensions do not match
loc_ loss += loss_ l. Error in data [0]

  1. Change. data[0] to. item(), and the in the following print is also changed to loss.item()
  2. Recommended reference:

bug2: automatically stop training

3. The load train data part is modified as shown in the figure above

bug3: the pytorch version may cause problems
Solution: according to the prompt statement, Baidu can modify it

bug4: running may lead to pytest
Solution: test in eval.py_ Net function name can be modified. The test keyword cannot appear. Bloggers can modify it to set_net runs successfully

bug5: training appears - nan
Solution: reduce learning rate

bug6: insufficient video memory Runtimeout
Solution: reduce batch_size

bug7: there are too many array indexes
IndexError: too many indices for array
Solution: because some labels have no data, there will be array index errors

If there is a lot of data, you can use the following script to find out which tag has the problem (pay attention to modifying your own tag path)

import argparse
import sys
import cv2
import os

import os.path          as osp
import numpy            as np

if sys.version_info[0] == 2:
    import xml.etree.cElementTree as ET
    import xml.etree.ElementTree  as ET

parser    = argparse.ArgumentParser(
            description='Single Shot MultiBox Detector Training With Pytorch')
train_set = parser.add_mutually_exclusive_group()

parser.add_argument('--root', default='data/VOCdevkit/VOC2020' , help='Dataset root directory path')

args = parser.parse_args()

CLASSES = [(  # always index 0
    'dargon fruit')]

annopath = osp.join('%s', 'Annotations', '%s.{}'.format("xml"))
imgpath  = osp.join('%s', 'JPEGImages',  '%s.{}'.format("jpg"))

def vocChecker(image_id, width, height, keep_difficult = False):
    target   = ET.parse(annopath % image_id).getroot()
    res      = []

    for obj in target.iter('object'):

        difficult = int(obj.find('difficult').text) == 1

        if not keep_difficult and difficult:

        name = obj.find('name').text.lower().strip()
        bbox = obj.find('bndbox')

        pts    = ['xmin', 'ymin', 'xmax', 'ymax']
        bndbox = []

        for i, pt in enumerate(pts):

            cur_pt = int(bbox.find(pt).text) - 1
            # scale height or width
            cur_pt = float(cur_pt) / width if i % 2 == 0 else float(cur_pt) / height


        label_idx =  dict(zip(CLASSES, range(len(CLASSES))))[name]
        res += [bndbox]  # [xmin, ymin, xmax, ymax, label_ind]
        # img_id = target.find('filename').text[:-4]
    try :
    except IndexError:
        print("\nINDEX ERROR HERE !\n")
    return res  # [[xmin, ymin, xmax, ymax, label_ind], ... ]

if __name__ == '__main__' :

    i = 0

    for name in sorted(os.listdir(osp.join(args.root,'Annotations'))):
    # as we have only one annotations file per image
        i += 1

        img    = cv2.imread(imgpath  % (args.root,name.split('.')[0]))
        height, width, channels = img.shape
        print("path : {}".format(annopath % (args.root,name.split('.')[0])))
        res = vocChecker((args.root, name.split('.')[0]), height, width)
    print("Total of annotations : {}".format(i))

It is not used commercially for learning

Keywords: neural networks Pytorch Computer Vision Mathematical Modeling ssd

Added by jagguy on Thu, 18 Nov 2021 05:27:27 +0200