Target detection | common dataset annotation format and conversion code

My blog https://blog.justlovesmile.top/

Target detection is an important research direction in computer vision tasks. It is used to detect specific kinds of visual target instances in digital images. As one of the fundamental problems of computer vision, target detection is the basis and premise of many other computer vision tasks, such as image description generation, instance segmentation and target tracking. When solving such problems, we often need to use our own scripts or annotation tools to generate data sets, and the data set formats are often diverse. Therefore, for target detection tasks, in order to be more compatible with training, most target detection model frameworks support several common data set annotation formats by default, including COCO, Pascal, VOC, YOLO, wait. This article mainly introduces the above data set formats and the Python script I wrote (generally need to be changed according to the actual situation).

1. COCO

1.1 COCO dataset format

COCO (Common Objects in COtext) dataset is a large-scale dataset suitable for target detection, image segmentation and Image Captioning tasks. Its annotation format is one of the most commonly used formats. At present, the COCO2017 dataset is widely used. Its official website is COCO - Common Objects in Context (cocodataset.org).

The COCO dataset mainly contains images (jpg or png, etc.) and annotation files (json). The format of the dataset is as follows (/ represents folder):

-coco/
    |-train2017/
    	|-1.jpg
    	|-2.jpg
    |-val2017/
    	|-3.jpg
    	|-4.jpg
    |-test2017/
    	|-5.jpg
    	|-6.jpg
    |-annotations/
    	|-instances_train2017.json
    	|-instances_val2017.json
    	|-*.json

train2017 and val2017 folders store the images of training set and verification set, while test2017 folder stores the information of test set, which can be only images or labels, which are generally used separately.

The files in the annotations folder are annotation files. If you have xml files, you usually need to convert them to json format. The format is as follows (for more details, please refer to Official website):

{
	"info": info, 
	"images": [image], //list
	"annotations": [annotation], //list
	"categories": [category], //list
	"licenses": [license], //list
}

info is the information of the whole data set, including year, version, description, etc. if it is only to complete the training task, it is not very important, as shown below:

//It's not that important for training
info{
	"year": int, 
	"version": str, 
	"description": str, 
	"contributor": str, 
	"url": str, 
	"date_created": datetime,
}

The image is the basic information of the image, including serial number, width, height, file name, etc. the serial number (id) needs to correspond to the serial number of the picture marked in the following annotations, as shown below:

image{
	"id": int, //necessary
	"width": int, //necessary
	"height": int, //necessary
	"file_name": str, //necessary
	"license": int,
	"flickr_url": str,
	"coco_url": str,
	"date_captured": datetime, 
}

Annotation is the most important annotation information, including serial number, image serial number, category serial number and so on, as shown below:

annotation{
	"id": int, //Label id
	"image_id": int, //Image id
	"category_id": int, //Category id
	"segmentation": RLE or [polygon], //Image segmentation annotation
	"area": float, //Area
	"bbox": [x,y,width,height], //Coordinates of the upper left corner of the target box and width and height
	"iscrowd": 0 or 1, //Is it dense
}

Category represents category information, including parent category, category serial number and category name, as shown below:

category{
	"id": int, //Category serial number
	"name": str, //Category name
	"supercategory": str, //Parent category
}

The license represents the agreement license information of the dataset, including serial number, agreement name and link information, as shown below:

//Not important for training
license{
	"id": int, 
	"name": str, 
	"url": str,
}

Next, let's look at a simple example:

{
"info": {slightly}, "images": [{"id": 1, "file_name": "1.jpg", "height": 334, "width": 500}, {"id": 2, "file_name": "2.jpg", "height": 445, "width": 556}], "annotations": [{"id": 1, "area": 40448, "iscrowd": 0, "image_id": 1, "bbox": [246, 61, 128, 316], "category_id": 3, "segmentation": []}, {"id": 2, "area": 40448, "iscrowd": 0, "image_id": 1, "bbox": [246, 61, 128, 316], "category_id": 2, "segmentation": []}, {"id": 3, "area": 40448, "iscrowd": 0, "image_id": 2, "bbox": [246, 61, 128, 316], "category_id": 1, "segmentation": []}], "categories": [{"supercategory": "none", "id": 1, "name": "liner"},{"supercategory": "none", "id": 2, "name": "containership"},{"supercategory": "none", "id": 3, "name": "bulkcarrier"}], "licenses": [{slightly}]
}

1.2 COCO conversion script

The Python conversion script is as follows. The image and xml annotation files need to be prepared:

# -*- coding: utf-8 -*-
# @Author    : justlovesmile
# @Date      : 2021/9/8 15:36
import os, random, json
import shutil as sh
from tqdm.auto import tqdm
import xml.etree.ElementTree as xmlET

def mkdir(path):
    if not os.path.exists(path):
        os.makedirs(path)
        return True
    else:
        print(f"The path ({path}) already exists.")
        return False

def readxml(file):
    tree = xmlET.parse(file)
    #Picture size field
    size = tree.find('size')
    width = int(size.find('width').text)
    height = int(size.find('height').text)
    #Target field
    objs = tree.findall('object')
    bndbox = []
    for obj in objs:
        label = obj.find("name").text
        bnd = obj.find("bndbox")
        xmin = int(bnd.find("xmin").text)
        ymin = int(bnd.find("ymin").text)
        xmax = int(bnd.find("xmax").text)
        ymax = int(bnd.find("ymax").text)
        bbox = [xmin, ymin, xmax, ymax, label]
        bndbox.append(bbox)
    return [[width, height], bndbox]

def tococo(xml_root, image_root, output_root,classes={},errorId=[],train_percent=0.9):
    # assert
    assert train_percent<=1 and len(classes)>0
    # define the root path
    train_root = os.path.join(output_root, "train2017")
    val_root = os.path.join(output_root, "val2017")
    ann_root = os.path.join(output_root, "annotations")
    # initialize train and val dict
    train_content = {
        "images": [],  # {"file_name": "09780.jpg", "height": 334, "width": 500, "id": 9780}
        "annotations": [],# {"area": 40448, "iscrowd": 0, "image_id": 1, "bbox": [246, 61, 128, 316], "category_id": 5, "id": 1, "segmentation": []}
        "categories": []  # {"supercategory": "none", "id": 1, "name": "liner"}
    }
    val_content = {
        "images": [],  # {"file_name": "09780.jpg", "height": 334, "width": 500, "id": 9780}
        "annotations": [],# {"area": 40448, "iscrowd": 0, "image_id": 1, "bbox": [246, 61, 128, 316], "category_id": 5, "id": 1, "segmentation": []}
        "categories": []  # {"supercategory": "none", "id": 1, "name": "liner"}
    }
    train_json = 'instances_train2017.json'
    val_json = 'instances_val2017.json'
    # divide the trainset and valset
    images = os.listdir(image_root)
    total_num = len(images)
    train_percent = train_percent
    train_num = int(total_num * train_percent)
    train_file = sorted(random.sample(images, train_num))
    if mkdir(output_root):
        if mkdir(train_root) and mkdir(val_root) and mkdir(ann_root):
            idx1, idx2, dx1, dx2 = 0, 0, 0, 0
            for file in tqdm(images):
                name=os.path.splitext(os.path.basename(file))[0]
                if name not in errorId:
                    res = readxml(os.path.join(xml_root, name + '.xml'))
                    if file in train_file:
                        idx1 += 1
                        sh.copy(os.path.join(image_root, file), train_root)
                        train_content['images'].append(
                            {"file_name": file, "width": res[0][0], "height": res[0][1], "id": idx1})
                        for b in res[1]:
                            dx1 += 1
                            x = b[0]
                            y = b[1]
                            w = b[2] - b[0]
                            h = b[3] - b[1]
                            train_content['annotations'].append(
                                {"area": w * h, "iscrowd": 0, "image_id": idx1, "bbox": [x, y, w, h],
                                 "category_id": classes[b[4]], "id": dx1, "segmentation": []})
                    else:
                        idx2 += 1
                        sh.copy(os.path.join(image_root, file), val_root)
                        val_content['images'].append(
                            {"file_name": file, "width": res[0][0], "height": res[0][1], "id": idx2})
                        for b in res[1]:
                            dx2 += 1
                            x = b[0]
                            y = b[1]
                            w = b[2] - b[0]
                            h = b[3] - b[1]
                            val_content['annotations'].append(
                                {"area": w * h, "iscrowd": 0, "image_id": idx2, "bbox": [x, y, w, h],
                                 "category_id": classes[b[4]], "id": dx2, "segmentation": []})
            for i, j in classes.items():
                train_content['categories'].append({"supercategory": "none", "id": j, "name": i})
                val_content['categories'].append({"supercategory": "none", "id": j, "name": i})
            with open(os.path.join(ann_root, train_json), 'w') as f:
                json.dump(train_content, f)
            with open(os.path.join(ann_root, val_json), 'w') as f:
                json.dump(val_content, f)
    print("Number of Train Images:", len(os.listdir(train_root)))
    print("Number of Val Images:", len(os.listdir(val_root)))
    
    
def test():
    box_root = "E:/MyProject/Dataset/hwtest/annotations" #xml folder
    image_root = "E:/MyProject/Dataset/hwtest/images" #image folder
    output_root = "E:/MyProject/Dataset/coco" #Output folder
    classes = {"liner": 0,"bulk carrier": 1,"warship": 2,"sailboat": 3,"canoe": 4,"container ship": 5,"fishing boat": 6} #Category dictionary
    errorId = [] #Dirty data id
    train_percent = 0.9 #Proportion of training set and verification set
    tococo(box_root, image_root, output_root,classes=classes,errorId=errorId,train_percent=train_percent)

if __name__ == "__main__":
    test()

2. VOC

2.1 VOC dataset format

The VOC (Visual Object Classes) dataset comes from PASCAL VOC challenge. Its main tasks include Object Classification, Object Detection, Object Segmentation, Human Layout and Action Classification. Its official website is The PASCAL Visual Object Classes Homepage (ox.ac.uk) The main data sets are voc207 and VOC2012.

VOC dataset mainly includes images (jpg or png, etc.) and annotation files (xml). Its dataset format is as follows (/ represents folder):

-VOC/
	|-JPEGImages/
		|-1.jpg
		|-2.jpg
	|-Annotations/
		|-1.xml
		|-2.xml
	|-ImageSets/
		|-Layout/
			|-*.txt
		|-Main/
			|-train.txt
			|-val.txt
			|-trainval.txt
			|-test.txt
		|-Segmentation/
			|-*.txt
		|-Action/
			|-*.txt
	|-SegmentationClass/
	|-SegmentationObject/

The most common and necessary folders for target detection tasks include JPEGImages, Annotations, ImageSets/Main.

Images are stored in JPEGImages, while xml annotation files are stored in Annotations. The contents of the files are as follows:

<annotation>
	<folder>VOC</folder>            # Image folder
	<filename>000032.jpg</filename> # Image file name
	<source>                        # Image source
		<database>The VOC Database</database>
		<annotation>PASCAL VOC</annotation>
		<image>flickr</image>
	</source>
	<size>                          # Image size information
		<width>500</width>    # Image width
		<height>281</height>  # Image height
		<depth>3</depth>      # Number of image channels
	</size>
	<segmented>0</segmented>  # Whether the image is used for segmentation, 0 means not applicable, which doesn't matter for target detection
	<object>                  # Information about a target object
		<name>aeroplane</name>    # Target's class alias
		<pose>Frontal</pose>      # Shooting angle, if none, is generally Unspecified
		<truncated>0</truncated>  # Whether it is truncated. 0 indicates that it is complete and not truncated
		<difficult>0</difficult>  # Whether it is difficult to identify, 0 means it is not difficult to identify
		<bndbox>            # Bounding box information
			<xmin>104</xmin>  # Upper left corner x
			<ymin>78</ymin>   # Upper left corner y
			<xmax>375</xmax>  # Lower right corner x
			<ymax>183</ymax>  # Lower right corner y
		</bndbox>
	</object>
    # The following is information about other targets, which is omitted here
	<object>
        other object Information, omitted here
	</object>
</annotation>

2.2 VOC conversion script

The following script is only applicable when there are images and xml files. It needs to be written after coco is converted to voc format:

# -*- coding: utf-8 -*-
# @Author    : justlovesmile
# @Date      : 2021/9/8 21:01
import os,random
from tqdm.auto import tqdm
import shutil as sh

def mkdir(path):
    if not os.path.exists(path):
        os.mkdir(path)
        return True
    else:
        print(f"The path ({path}) already exists.")
        return False

def tovoc(xmlroot,imgroot,saveroot,errorId=[],classes={},tvp=1.0,trp=0.9):
    '''
    Parameters:
        root: Data set storage root directory
    Function:
        Load data and save as VOC format
    Loaded format:
    VOC/
      Annotations/
        - **.xml
      JPEGImages/
        - **.jpg
      ImageSets/
        Main/
          - train.txt
          - test.txt
          - val.txt
          - trainval.txt
    '''
    # assert
    assert len(classes)>0
    # init path
    VOC = saveroot
    ann_path = os.path.join(VOC, 'Annotations')
    img_path = os.path.join(VOC,'JPEGImages')
    set_path = os.path.join(VOC,'ImageSets')
    txt_path = os.path.join(set_path,'Main')
    # mkdirs 
    if mkdir(VOC):
        if mkdir(ann_path) and mkdir(img_path) and mkdir(set_path):
            mkdir(txt_path)

    images = os.listdir(imgroot)
    list_index = range(len(images))
    #test and trainval set
    trainval_percent = tvp
    train_percent = trp
    val_percent = 1 - train_percent if train_percent<1 else 0.1
    total_num = len(images)
    trainval_num = int(total_num*trainval_percent)
    train_num = int(trainval_num*train_percent)
    val_num = int(trainval_num*val_percent) if train_percent<1 else 0

    trainval = random.sample(list_index,trainval_num)
    train = random.sample(list_index,train_num)
    val = random.sample(list_index,val_num)
    
    for i in tqdm(list_index):
        imgfile = images[i]
        img_id = os.path.splitext(os.path.basename(imgfile))[0]
        xmlfile = img_id+".xml"
        sh.copy(os.path.join(imgroot,imgfile),os.path.join(img_path,imgfile))
        sh.copy(os.path.join(xmlroot,xmlfile),os.path.join(ann_path,xmlfile))
        if img_id not in errorId:
            if i in trainval:
                with open(os.path.join(txt_path,'trainval.txt'),'a') as f:
                    f.write(img_id+'\n')
                if i in train:
                    with open(os.path.join(txt_path,'train.txt'),'a') as f:
                        f.write(img_id+'\n')
                else:
                    with open(os.path.join(txt_path,'val.txt'),'a') as f:
                        f.write(img_id+'\n')
                if train_percent==1 and i in val:
                    with open(os.path.join(txt_path,'val.txt'),'a') as f:
                        f.write(img_id+'\n')          
            else:
                with open(os.path.join(txt_path,'test.txt'),'a') as f:
                    f.write(img_id+'\n')
    
    # end
    print("Dataset to VOC format finished!")

def test():
    box_root = "E:/MyProject/Dataset/hwtest/annotations"
    image_root = "E:/MyProject/Dataset/hwtest/images"
    output_root = "E:/MyProject/Dataset/voc"
    classes = {"liner": 0,"bulk carrier": 1,"warship": 2,"sailboat": 3,"canoe": 4,"container ship": 5,"fishing boat": 6}
    errorId = []
    train_percent = 0.9
    tovoc(box_root,image_root,output_root,errorId,classes,trp=train_percent)

if __name__ == "__main__":
    test()

3. YOLO

3.1 YOLO dataset format

The format of the YOLO dataset is mainly used to train the YOLO model. There are no fixed requirements for its file format, because the data can be loaded by modifying the configuration file of the model. The only thing to note is that the annotation format of the YOLO dataset is to normalize the position information of the target frame (normalization here refers to dividing the picture width and height), as shown below:

{Target category} {Normalized target center point x coordinate} {Normalized target center point y coordinate} {Normalized target frame width w} {Normalized target frame height h}

3.2 YOLO conversion script

The Python conversion script is as follows:

# -*- coding: utf-8 -*-
# @Author    : justlovesmile
# @Date      : 2021/9/8 20:28
import os
import random
from tqdm.auto import tqdm
import shutil as sh
try:
    import xml.etree.cElementTree as et
except ImportError:
    import xml.etree.ElementTree as et

def mkdir(path):
    if not os.path.exists(path):
        os.makedirs(path)
        return True
    else:
        print(f"The path ({path}) already exists.")
        return False  

def xml2yolo(xmlpath,savepath,classes={}):
    namemap = classes
    #try:
    #    with open('classes_yolo.json','r') as f:
    #        namemap=json.load(f)
    #except:
    #    pass
    rt = et.parse(xmlpath).getroot()
    w = int(rt.find("size").find("width").text)
    h = int(rt.find("size").find("height").text)
    with open(savepath, "w") as f:
        for obj in rt.findall("object"):
            name = obj.find("name").text
            xmin = int(obj.find("bndbox").find("xmin").text)
            ymin = int(obj.find("bndbox").find("ymin").text)
            xmax = int(obj.find("bndbox").find("xmax").text)
            ymax = int(obj.find("bndbox").find("ymax").text)
            f.write(
                f"{namemap[name]} {(xmin+xmax)/w/2.} {(ymin+ymax)/h/2.} {(xmax-xmin)/w} {(ymax-ymin)/h}"
                + "\n"
            )

def trainval(xmlroot,imgroot,saveroot,errorId=[],classes={},tvp=1.0,trp=0.9):
    # assert
    assert tvp<=1.0 and trp <=1.0 and len(classes)>0
    # create dirs
    imglabel = ['images','labels']
    trainvaltest = ['train','val','test']
    mkdir(saveroot)
    for r in imglabel:
        mkdir(os.path.join(saveroot,r))
        for s in trainvaltest:
            mkdir(os.path.join(saveroot,r,s))
    #train / val
    trainval_percent = tvp
    train_percent = trp
    val_percent = 1 - train_percent if train_percent<1.0 else 0.15
    
    total_img = os.listdir(imgroot)
    num = len(total_img)
    list_index = range(num)
    tv = int(num * trainval_percent)
    tr = int(tv * train_percent)
    va = int(tv * val_percent)
    trainval = random.sample(list_index, tv) # trainset and valset
    train = random.sample(trainval, tr) # trainset
    val = random.sample(trainval, va) #valset, use it only when train_percent = 1 

    print(f"trainval_percent:{trainval_percent},train_percent:{train_percent},val_percent:{val_percent}")
    for i in tqdm(list_index):
        name = total_img[i]
        op = os.path.join(imgroot,name)
        file_id = os.path.splitext(os.path.basename(name))[0]
        if file_id not in errorId:
            xmlp = os.path.join(xmlroot,file_id+'.xml')
            if i in trainval:
                # trainset and valset
                if i in train:
                    sp = os.path.join(saveroot,"images","train",name)
                    xml2yolo(xmlp,os.path.join(saveroot,"labels","train",file_id+'.txt'),classes)
                    sh.copy(op,sp)
                else:
                    sp = os.path.join(saveroot,"images","val",name)
                    xml2yolo(xmlp,os.path.join(saveroot,"labels","val",file_id+'.txt'),classes)
                    sh.copy(op,sp)
                if (train_percent==1.0 and i in val):
                    sp = os.path.join(saveroot,"images","val",name)
                    xml2yolo(xmlp,os.path.join(saveroot,"labels","val",file_id+'.txt'),classes)
                    sh.copy(op,sp)
            else:
                # testset
                sp = os.path.join(saveroot,"images","test",name)
                xml2yolo(xmlp,os.path.join(saveroot,"labels","test",file_id+'.txt'),classes)
                sh.copy(op,sp)

def maketxt(dir,saveroot,filename):
    savetxt = os.path.join(saveroot,filename)
    with open(savetxt,'w') as f:
        for i in tqdm(os.listdir(dir)):
            f.write(os.path.join(dir,i)+'\n')
                           
def toyolo(xmlroot,imgroot,saveroot,errorId=[],classes={},tvp=1,train_percent=0.9):
    # toyolo main function
    trainval(xmlroot,imgroot,saveroot,errorId,classes,tvp,train_percent)
    maketxt(os.path.join(saveroot,"images","train"),saveroot,"train.txt")
    maketxt(os.path.join(saveroot,"images","val"),saveroot,"val.txt")
    maketxt(os.path.join(saveroot,"images","test"),saveroot,"test.txt")
    print("Dataset to yolo format success.")

def test():
    box_root = "E:/MyProject/Dataset/hwtest/annotations"
    image_root = "E:/MyProject/Dataset/hwtest/images"
    output_root = "E:/MyProject/Dataset/yolo"
    classes = {"liner": 0,"bulk carrier": 1,"warship": 2,"sailboat": 3,"canoe": 4,"container ship": 5,"fishing boat": 6}
    errorId = []
    train_percent = 0.9
    toyolo(box_root,image_root,output_root,errorId,classes,train_percent=train_percent)

if __name__ == "__main__":
    test()

Following this script, the following will be generated in the output folder:

-yolo/
	|-images/
		|-train/
			|-1.jpg
			|-2.jpg
		|-test/
			|-3.jpg
			|-4.jpg
		|-val/
			|-5.jpg
			|-6.jpg
	|-labels/
		|-train/
			|-1.txt
			|-2.txt
		|-test/
			|-3.txt
			|-4.txt
		|-val/
			|-5.txt
			|-6.txt
	|-train.txt
	|-test.txt
	|-val.txt

Keywords: Python Deep Learning

Added by stormx on Sat, 11 Sep 2021 22:46:13 +0300