VOC data sets and COCO data sets are directly converted to each other

Conversion between VOC dataset (xml format) and COCO dataset (json format)

Let's first look at the directory structure of voc and coco datasets:
Take the VOC2012 dataset as an example, there are five folders below:

The Annotations folder is the XML file corresponding to the image. For example, "2007_000027.xml" stores the information corresponding to the image 2007_000027.jpg. Open it with notepad and you can see that this is the data in XML format.
The txt files of the training set and verification set officially divided for us are stored in the ImageSets folder. We mainly use the train.txt and val.txt files under the "ImageSets/Main /" folder. The train.txt file stores the image name of the officially divided training set, and the val.txt file stores the name of the verification set image.
Another folder that needs attention is JEPGImages, which stores the original pictures corresponding to the picture name. We don't need to pay special attention to the remaining two folders.

Next, let's take a look at the information in the xml file of the voc dataset.

    <folder>Folder directory</folder>
    <filename>Picture name.jpg</filename>

You can see that an xml file contains the following information:

  • Folder: folder
  • filename: file name
  • path: path
  • Source: source
  • Size: picture size
  • segmented: image segmentation will be used. This article only introduces the target detection (taking the bounding box as an example)
  • Object: an xml file can have multiple objects. Each object represents a box. Each box is composed of the following information:
  • name: which category does the object in the modified box belong to, such as Apple
  • bndbox: give the coordinates of the upper left corner and the lower right corner
  • Truncated: is it truncated
  • Difficult: is it a difficult object to detect

Unlike VOC, an image corresponds to an xml file. Coco directly writes all images and corresponding box information in a json file. Generally, the entire coco directory is as follows:

|______annotations # Store label information
|        |__train.json
|        |__val.json
|        |__test.json
|______trainset # Store training set images
|______valset   # Store verification set images
|______testset  # Store test set images

A standard json file contains the following information:

    "info" : info,
    "licenses" : [license],
    "images" : [image],
    "annotations" : [annataton],
    "categories" : [category]

From the overall json structure above, we can see that the type of the value corresponding to the info key is a dictionary; The types of values corresponding to the four keys of licenses, images, annotations and categories are all a list, and the data type stored in the list is still a dictionary.
We can get the length of images, annotations and categories through len(List), and we get the following contents.

(1) Length of list elements in the images field = number of pictures included in the training set (or test set);
(2) The number of list elements in the annotations field = the number of bounding box es in the training set (or test set);
(3) Number of list elements in categories field = number of categories

Next, let's look at the corresponding contents of each key:


"year" : int,                # particular year
"version" : str,             # edition
"description" : str,         # Detailed description
"contributor" : str,         # author
"url" : str,                 # Protocol link
"date_created" : datetime,   # Generation date


"images": [                                            
{"id": 0,                                                # int image id, starting from 0
 "file_name": "0.jpg",                                   # str file name
 "width": 512,                                           # Width of int image
 "height": 512,                                          # Height of int image
 "date_captured": "2020-04-14 01:45:07.508146",          # datatime get date
 "license": 1,                                           # Which protocol does int follow
 "coco_url": "",                                         # str coco picture link url
 "flickr_url": ""                                        # str flick picture link url


 "licenses": [
 "id": 1,                                            # The license followed by the int protocol id number in images is 1
 "name": null,                                       # str protocol name        
 "url": null                                         # str protocol link      


"annotations": [ 
 "id": 0,                                   # int the id number of each marked object in the picture
 "image_id": 0,                             # int the number of the picture where the object is located
 "category_id": 2,                          # int the category id number of the marked object
 "iscrowd": 0,                              # Whether the 0 or 1 target is covered. The default value is 0
 "area": 4095.9999999999986,                # float area of detected object (64 * 64 = 4096)
 "bbox": [200.0, 416.0, 64.0, 64.0],        # [x, y, width, height] coordinate information of target detection frame
 "segmentation": [[200.0, 416.0, 264.0, 416.0, 264.0, 480.0, 200.0, 480.0]]  

In "bbox" [x, y, width, height]x, y represents the coordinate values of X and Y in the upper left corner of the object.

In "segmentation" [x1, y1, x2, y2, x3, y3, x4, y4] are the other three coordinate points selected clockwise starting from the coordinates of the upper left corner. And [upper left x, upper left y, upper right x, upper right y, lower right x, lower right y, lower left x, lower left y].


 "id": 1,                                 # int category id number
 "name": "rectangle",                     # str category name
 "supercategory": "None"                  # str belongs to a large category, such as trucks and cars, which belong to the motor vehicle class
 "id": 2,
 "name": "circle", 
 "supercategory": "None"

1, Convert the xml of voc dataset into json format of coco dataset

GitHub open source project address

Before starting the conversion, you have to convert all the information you want to convert The xml file name is saved in xml_list.txt list. If it is a voc dataset made by yourself, remember not to type the class alias name wrong when entering the tag name.

# create_xml_list.py
import os
xml_list = os.listdir('C:/Users/user/Desktop/train')
with open('C:/Users/user/Desktop/xml_list.txt','a') as f:
    for i in xml_list:
        if i[-3:]=='xml':

Execute Python voc2coco py xml_ list. Txt file path The real storage path of XML file is transformed json storage path can convert XML into a json file.

# voc2coco.py

# pip install lxml

import sys
import os
import json
import xml.etree.ElementTree as ET

# If necessary, pre-define category and its id
#  PRE_DEFINE_CATEGORIES = {"aeroplane": 1, "bicycle": 2, "bird": 3, "boat": 4,
                         #  "bottle":5, "bus": 6, "car": 7, "cat": 8, "chair": 9,
                         #  "cow": 10, "diningtable": 11, "dog": 12, "horse": 13,
                         #  "motorbike": 14, "person": 15, "pottedplant": 16,
                         #  "sheep": 17, "sofa": 18, "train": 19, "tvmonitor": 20}

def get(root, name):
    vars = root.findall(name)
    return vars

def get_and_check(root, name, length):
    vars = root.findall(name)
    if len(vars) == 0:
        raise NotImplementedError('Can not find %s in %s.'%(name, root.tag))
    if length > 0 and len(vars) != length:
        raise NotImplementedError('The size of %s is supposed to be %d, but is %d.'%(name, length, len(vars)))
    if length == 1:
        vars = vars[0]
    return vars

def get_filename_as_int(filename):
        filename = os.path.splitext(filename)[0]
        return int(filename)
        raise NotImplementedError('Filename %s is supposed to be an integer.'%(filename))

def convert(xml_list, xml_dir, json_file):
    list_fp = open(xml_list, 'r')
    json_dict = {"images":[], "type": "instances", "annotations": [],
                 "categories": []}
    categories = PRE_DEFINE_CATEGORIES
    for line in list_fp:
        line = line.strip()
        print("Processing %s"%(line))
        xml_f = os.path.join(xml_dir, line)
        tree = ET.parse(xml_f)
        root = tree.getroot()
        path = get(root, 'path')
        if len(path) == 1:
            filename = os.path.basename(path[0].text)
        elif len(path) == 0:
            filename = get_and_check(root, 'filename', 1).text
            raise NotImplementedError('%d paths found in %s'%(len(path), line))
        ## The filename must be a number
        image_id = get_filename_as_int(filename)
        size = get_and_check(root, 'size', 1)
        width = int(get_and_check(size, 'width', 1).text)
        height = int(get_and_check(size, 'height', 1).text)
        image = {'file_name': filename, 'height': height, 'width': width,
        ## Cruuently we do not support segmentation
        #  segmented = get_and_check(root, 'segmented', 1).text
        #  assert segmented == '0'
        for obj in get(root, 'object'):
            category = get_and_check(obj, 'name', 1).text
            if category not in categories:
                new_id = len(categories)
                categories[category] = new_id
            category_id = categories[category]
            bndbox = get_and_check(obj, 'bndbox', 1)
            xmin = int(get_and_check(bndbox, 'xmin', 1).text) - 1
            ymin = int(get_and_check(bndbox, 'ymin', 1).text) - 1
            xmax = int(get_and_check(bndbox, 'xmax', 1).text)
            ymax = int(get_and_check(bndbox, 'ymax', 1).text)
            assert(xmax > xmin)
            assert(ymax > ymin)
            o_width = abs(xmax - xmin)
            o_height = abs(ymax - ymin)
            ann = {'area': o_width*o_height, 'iscrowd': 0, 'image_id':
                   image_id, 'bbox':[xmin, ymin, o_width, o_height],
                   'category_id': category_id, 'id': bnd_id, 'ignore': 0,
                   'segmentation': []}
            bnd_id = bnd_id + 1

    for cate, cid in categories.items():
        cat = {'supercategory': 'none', 'id': cid, 'name': cate}
    json_fp = open(json_file, 'w')
    json_str = json.dumps(json_dict)

if __name__ == '__main__':
    if len(sys.argv) <= 1:
        print('3 auguments are need.')
        print('Usage: %s XML_LIST.txt XML_DIR OUTPU_JSON.json'%(sys.argv[0]))

    convert(sys.argv[1], sys.argv[2], sys.argv[3])

2, Convert json file in COCO format into xml file in VOC format

If you want to convert a json file in COCO format into an xml file in VOC format, use anno and xml_ Change dir to json file path and the saved path of the converted xml file, and execute the following code to complete the conversion.

# coco2voc.py

# pip install pycocotools
import os
import time
import json
import pandas as pd
from tqdm import tqdm
from pycocotools.coco import COCO
#json file path and path for storing xml file
anno = 'C:/Users/user/Desktop/val/instances_val2017.json'
xml_dir = 'C:/Users/user/Desktop/val/xml/'

coco = COCO(anno)  # read file
cats = coco.loadCats(coco.getCatIds())  # Here, loadCats is the interface provided by coco to obtain categories
# Create anno dir
dttm = time.strftime("%Y%m%d%H%M%S", time.localtime())

def trans_id(category_id):
    names = []
    namesid = []
    for i in range(0, len(cats)):
    index = namesid.index(category_id)
    return index

def convert(anno,xml_dir): 

    with open(anno, 'r') as load_f:
        f = json.load(load_f)
    imgs = f['images']  #IMG of json file_ How many images does the imgs list represent
    cat = f['categories']
    df_cate = pd.DataFrame(f['categories'])                     # Categories in json
    df_cate_sort = df_cate.sort_values(["id"], ascending=True)  # Sort by category id
    categories = list(df_cate_sort['name'])                     # Get all category names
    print('categories = ', categories)
    df_anno = pd.DataFrame(f['annotations'])                    # annotation in json
    for i in tqdm(range(len(imgs))):  # The large loop is all images, and Tqdm is an extensible Python progress bar. You can add a progress prompt to the long loop
        xml_content = []
        file_name = imgs[i]['file_name']    # Through img_id find the information of the picture
        height = imgs[i]['height']
        img_id = imgs[i]['id']
        width = imgs[i]['width']
        version =['"1.0"','"utf-8"'] 
        # Add attributes to xml file
        xml_content.append("<?xml version=" + version[0] +" "+ "encoding="+ version[1] + "?>")
        xml_content.append("    <filename>" + file_name + "</filename>")
        xml_content.append("    <size>")
        xml_content.append("        <width>" + str(width) + "</width>")
        xml_content.append("        <height>" + str(height) + "</height>")
        xml_content.append("        <depth>"+ "3" + "</depth>")
        xml_content.append("    </size>")
        # Through img_id found annotations
        annos = df_anno[df_anno["image_id"].isin([img_id])]  # (2,8) indicates that a drawing has two boxes
        for index, row in annos.iterrows():  # All annotation information of a graph
            bbox = row["bbox"]
            category_id = row["category_id"]
            cate_name = categories[trans_id(category_id)]
            # add new object
            xml_content.append("    <object>")
            xml_content.append("        <name>" + cate_name + "</name>")
            xml_content.append("        <truncated>0</truncated>")
            xml_content.append("        <difficult>0</difficult>")
            xml_content.append("        <bndbox>")
            xml_content.append("            <xmin>" + str(int(bbox[0])) + "</xmin>")
            xml_content.append("            <ymin>" + str(int(bbox[1])) + "</ymin>")
            xml_content.append("            <xmax>" + str(int(bbox[0] + bbox[2])) + "</xmax>")
            xml_content.append("            <ymax>" + str(int(bbox[1] + bbox[3])) + "</ymax>")
            xml_content.append("        </bndbox>")
            xml_content.append("    </object>")
        x = xml_content
        xml_content = [x[i] for i in range(0, len(x)) if x[i] != "\n"]
        ### list save file
        #xml_path = os.path.join(xml_dir, file_name.replace('.xml', '.jpg'))
        xml_path = os.path.join(xml_dir, file_name.split('j')[0]+'xml')
        with open(xml_path, 'w+', encoding="utf8") as f:
        xml_content[:] = []

if __name__ == '__main__':

3, Convert txt file to XML format of Pascal VOC

For example, the directory of the BIllboard dataset downloaded from OpenImageV5 is as follows:

|______images # Store training set images
|        |__train
|        |__val
|______labels # Store label information
|        |__train
|        |__val

The content in the txt corresponding to each image corresponds to the coordinate information of the target. As shown in the following figure, 0 indicates that there is only one category of billboard.

The code for converting txt file into XML format of Pascal VOC is as follows:

#! /usr/bin/python
# -*- coding:UTF-8 -*-
import os, sys
import glob
from PIL import Image
# VEDAI image storage location
src_img_dir = "F:/Billboard/dataset/images/val"
# Storage location of txt file of ground truth of VEDAI image
src_txt_dir = "F:/Billboard/dataset/labels/val"
src_xml_dir = "F:/Billboard/dataset/xml/val"
img_Lists = glob.glob(src_img_dir + '/*.jpg')
img_basenames = [] # e.g. 100.jpg
for item in img_Lists:
img_names = [] # e.g. 100
for item in img_basenames:
    temp1, temp2 = os.path.splitext(item)
for img in img_names:
    im = Image.open((src_img_dir + '/' + img + '.jpg'))
    width, height = im.size
    # open the crospronding txt file
    gt = open(src_txt_dir + '/' + img + '.txt').read().splitlines()
    #gt = open(src_txt_dir + '/gt_' + img + '.txt').read().splitlines()
    # write in xml file
    #os.mknod(src_xml_dir + '/' + img + '.xml')
    xml_file = open((src_xml_dir + '/' + img + '.xml'), 'w')
    xml_file.write('    <folder>VOC2007</folder>\n')
    xml_file.write('    <filename>' + str(img) + '.png' + '</filename>\n')
    xml_file.write('    <size>\n')
    xml_file.write('        <width>' + str(width) + '</width>\n')
    xml_file.write('        <height>' + str(height) + '</height>\n')
    xml_file.write('        <depth>3</depth>\n')
    xml_file.write('    </size>\n')
    # write the region of image on xml file
    for img_each_label in gt:
        spt = img_each_label.split(' ') #Here, if the txt is separated by commas', ', it will be changed to spt = img_each_label.split(','). 
        xml_file.write('    <object>\n')
        xml_file.write('        <name>' + str(name[int(spt[0])]) + '</name>\n')
        xml_file.write('        <pose>Unspecified</pose>\n')
        xml_file.write('        <truncated>0</truncated>\n')
        xml_file.write('        <difficult>0</difficult>\n')
        xml_file.write('        <bndbox>\n')
        xml_file.write('            <xmin>' + str(spt[1]) + '</xmin>\n')
        xml_file.write('            <ymin>' + str(spt[2]) + '</ymin>\n')
        xml_file.write('            <xmax>' + str(spt[3]) + '</xmax>\n')
        xml_file.write('            <ymax>' + str(spt[4]) + '</ymax>\n')
        xml_file.write('        </bndbox>\n')
        xml_file.write('    </object>\n')

So far, we can basically deal with the data conversion commonly used in target detection. No matter what data set we get, VOC, COCO or various txt formats, we can use the above method to convert it into the data set we need. As for making your own dataset, it's also very simple, and the space is limited. I'll summarize it in the next article...

Keywords: Python xml Object Detection annotations

Added by gorgo666 on Thu, 17 Feb 2022 16:10:03 +0200