Tensorflow Notes: Traffic Sign Recognition
The content of this blog is: BlackWalnut Labs complete Traffic Sign Recognition Experiment Learning Notes
The main part is BlackWalnut Labs The process of the experiment is introduced, and a few of my own experience in the process of the experiment, so as a blog post reprinted from BlackWalnut.
Note: The following are all in the BlackWalnut Labs Provided environment, if you want to run in their own environment, you can refer to TensorFlow Object Detection Project
Catalog
- Tensorflow Notes: Traffic Sign Recognition
- I. What is Traffic Sign Recognition
- Second, what will we create?
- 3. What do you need?
- IV. EXPERIMENTAL PROCESS
- 4.1 Collect Traffic Sign Pictures
- 4.2 LabelImg is used to tag images
- 4.3 Converting images and labels to TFRecord
- 4.4 Selection Model
- 4.5 Training Model
- 4.6 Prediction Experiments
- Five, appendix
I. What is Traffic Sign Recognition
Traffic Sign Recognition System ADAS (Advanced Driving Assistance System) One of the components of the scene enables the car to automatically recognize the traffic signs ahead and alert the driver.
Google provided one on github Items of Object Recognition The project can be used to train various object recognition models. These models do not need preprocessing, they can directly identify traffic signs in the image, and return the type of traffic signs and the location in the graph.
Object recognition model is a hybrid technology. First, each object is segmented by image segmentation technology. Then, whether the segmented object is an object in training set is identified by image classification technology.
Second, what will we create?
In this experiment, the test will be carried out. TensorFlow Object Detection Project Traffic signs taken by ourselves are used to train the object recognition model of centralized training, and the following skills will be established:
Collecting pictures of traffic signs
Use LabelImg to tag images
Converting images and labels to TFRecord
Selection and deployment of ssd_mobilenet_v1 model
Training model
Use test pictures to perform predictions
3. What do you need?
BlackWalnut Labs. AI Cloud Access Account
Windows 10 (optional)
IV. EXPERIMENTAL PROCESS
4.1 Collect Traffic Sign Pictures
Object Detection's data set is different from classifier's data set, and the image needs to contain the whole scene of recognizing objects. For example, the type of traffic sign that needs to be identified is "pedestrians in front", so the training set image only needs to contain the part of "pedestrians in front", and multiple traffic signs can appear in the same picture.
Using object recognition technology in real scenes requires hundreds or even thousands of images as training sets, but in Codelab, about 100 images per set of traffic signs can be effective because they are collected manually and the scenes are fixed.
Create a new directory named Codelab as the project directory, under which a new directory named data/TrafficSign is created to store data sets. from Codelab Provide the collected training set, which can be found in the Tools/datasets/image directory. The data set contains 200 pictures, each of which contains two more obvious traffic signs. Switch to the Terminal interface, enter the Codelab directory, and copy the image files to the Codelab/data/TrafficSign directory.
cp -r ../Tools/datasets/image/* data/TrafficSign
Of course, you can also use your own data set, as long as the picture is clear and meets the following requirements.
(1) The resolution of the photograph is 480*480
(2) Each picture contains at least one clear traffic sign.
(3) Each traffic sign appears at least 100 times
4.2 LabelImg is used to tag images
When training object recognition model, we need not only the image as a data set, but also the location and type of the object in the model image. That is to say, we need to generate the corresponding XML file of the picture. In order to locate objects accurately and conveniently in the graph, LabelImg tool based on Windows 10 platform can be used. click Here Relevant versions of download tools
The Windows_v1.6 version is used here. After decompressing the downloaded file, run the labelImg.exe file directly. The interface of the software is English, but it is easy to understand. The basic functions are explained as follows:
option | function |
---|---|
Open | Open a single picture |
Open Dir | Represents opening a directory. The software supports reading all pictures in a directory. |
Change Save Dir | Represents the directory where changes are saved, and the software supports automatic saving. |
Next Image | Represents the next picture. |
Prev Image | Represent the previous picture |
Save | Represents saving pictures |
Create RectBox | Represents creating labels, and object labels are represented by rectangular boxes |
Duplicate RectBox | Represents a copy label |
Delete RectBox | Represents deletion Tags |
Usage method:
After creating the label box by dragging the mouse, the program will pop up a dialog box defining the label name. When a tag is set up, the tag will appear in the upper right corner. When the tag file is saved, the tag file of XML type will be automatically generated. For tag files, here's a suggestion - the image file has the same prefix name as the tag file.
Calibration of data sets is a big job in order to save time. Codelab Provides a calibrated tag file with the same prefix name as the picture, which can be found in the Tools/datasets/label directory. The calibrated data set can be used with the traffic sign data set taken by oneself. Switch to the Terminal interface, enter the Codelab directory, and copy the label file to the Codelab/data/TrafficSign directory.
cp -r ../Tools/datasets/label/* data/TrafficSign
Under the Codelab/data directory, a new file named label.pbtxt is created, which is based on the data set actually used. The contents of the file are as follows:
item { id: 1 name: 'Car' } item { id: 2 name: '10T' } item { id: 3 name: 'BanRight' } item { id: 4 name: 'Pedestrian' } item { id: 5 name: 'TurnLeft' } item { id: 6 name: 'Speaker' } item { id: 7 name: 'Crosswalk' } item { id: 8 name: 'TurnAround' } item { id: 9 name: 'GoStraight' } item { id: 10 name: 'GoStraightOrRight' } item { id: 11 name: 'GoStraightOrLeft' } item { id: 12 name: 'BanLeft' } item { id: 13 name: 'TurnRight' } item { id: 14 name: 'BanSpeaker' } item { id: 15 name: 'NoParking' } item { id: 16 name: 'BanTurnAround' } item { id: 17 name: 'BanCar' } item { id: 18 name: 'BanStraightAndLeft' } item { id: 19 name: 'SlowDown' } item { id: 20 name: 'BanLeftAndRight' } item { id: 21 name: 'Limit40' } item { id: 22 name: 'BanStraightAndRight' }
4.3 Converting images and labels to TFRecord
Image and label can be streamed. Google has standardized the format of the training set TFREcord and created a new Python 3 file named Image2TFRecord.py in the Codelab directory. Before you start writing the generator, import and configure the locally dependent directory structure.
import sys sys.path.append('/home/jovyan/Appendix/tensorflow_models/research') sys.path.append('/home/jovyan/Appendix/tensorflow_models/research/slim')
Import dependent packages after importing local dependent directories.
import numpy as np import tensorflow as tf import os import xml.dom.minidom from object_detection.utils import dataset_util
In the Object Detection project, Google provides a generation Functions of TFRecord However, this function can only process one picture at a time, but can process data sets in batches based on it.
def create_cat_tf_example(encoded_cat_image_data): """Creates a tf.Example proto from sample cat image. Args: encoded_cat_image_data: The jpg encoded data of the cat image. Returns: example: The created tf.Example. """ height = 1032.0 width = 1200.0 filename = 'example_cat.jpg' image_format = b'jpg' xmins = [322.0 / 1200.0] xmaxs = [1062.0 / 1200.0] ymins = [174.0 / 1032.0] ymaxs = [761.0 / 1032.0] classes_text = ['Cat'] classes = [1] tf_example = tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature(filename), 'image/source_id': dataset_util.bytes_feature(filename), 'image/encoded': dataset_util.bytes_feature(encoded_image_data), 'image/format': dataset_util.bytes_feature(image_format), 'image/object/bbox/xmin': dataset_util.float_list_feature(xmins), 'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs), 'image/object/bbox/ymin': dataset_util.float_list_feature(ymins), 'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs), 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 'image/object/class/label': dataset_util.int64_list_feature(classes), })) return tf_example
In this function, only the height, width, filename, image_format, xmins, xmaxs, ymins, ymaxs, classes_text, classes are dynamically acquired based on the modification of the input parameters, which can be used for batch processing.
parameter | Meaning |
---|---|
height | Represents the height of the image to be processed |
width | Represents the width of the image to be processed |
filename | Represents the name of the image to be processed to distinguish between different pictures |
image_format | Represents the format of the picture, usually jpeg or png |
xmins | Represents the abscissa value of the left border of the object label box in the picture. The variable is an array. A value in the array represents an object label. |
xmax | Represents the abscissa value of the right border of the object label box in the picture. The variable is an array. A value in the array represents an object label. |
ymin | Represents the ordinate value of the upper border of the object label box in the picture. The variable is an array. A value in the array represents an object label. |
ymax | Represents the ordinate value of the lower border of the object label box in the picture. The variable is an array. A value in the array represents an object label. |
classes_text | Represents the name of the object in the picture, the variable is an array, and a value in the array represents an object label. |
classes | Represents the ID number corresponding to the object name in the picture. For all data sets used for training, this number is numbered from the beginning. The variable is an array, and a value in the array represents an object label. |
Modify the transformation function, add the input parameters of the function, and use them as the dictionary of the above parameters.
def create_tf_example(image,labelInfo,imageInfo): height = imageInfo['height'] # Image height width = imageInfo['width'] # Image width filename = labelInfo['name'] # Filename of the image. Empty if image is not from file encoded_image_data = image # Encoded image bytes image_format = b'jpeg' # b'jpeg' or b'png' xmins = labelInfo['xmins'] # List of normalized left x coordinates in bounding box (1 per box) xmaxs = labelInfo['xmaxs'] # List of normalized right x coordinates in bounding box # (1 per box) ymins = labelInfo['ymins'] # List of normalized top y coordinates in bounding box (1 per box) ymaxs = labelInfo['ymaxs'] # List of normalized bottom y coordinates in bounding box # (1 per box) classes_text = labelInfo['classes_text'] # List of string class name of bounding box (1 per box) classes = labelInfo['classes'] # List of integer class id of bounding box (1 per box) tf_example = tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature(filename), 'image/source_id': dataset_util.bytes_feature(filename), 'image/encoded': dataset_util.bytes_feature(encoded_image_data), 'image/format': dataset_util.bytes_feature(image_format), 'image/object/bbox/xmin': dataset_util.float_list_feature(xmins), 'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs), 'image/object/bbox/ymin': dataset_util.float_list_feature(ymins), 'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs), 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 'image/object/class/label': dataset_util.int64_list_feature(classes), })) return tf_example
Before calling the function, set the global image parameters, i.e. the width and height of the image pixels.
imageInfo = { 'height':480, 'width':480 }
Each invocation of the conversion function returns a variable that stores binary files. Each image is converted once. All converted variables are accumulated to form a new variable and then output to a file, which is called TFRecord. TensorFlow provides functions that can save it locally, as follows:
trainFileWriter = tf.python_io.TFRecordWriter('data/train.record') trainDirString = 'data/TrafficSign' trainDir = os.listdir(trainDirString) for fileName in trainDir: if(fileName.endswith('.jpg')): # ... I'll insert the code here later. with tf.gfile.GFile(img_path, 'rb') as fid: encoded_jpg = fid.read() tf_example = create_tf_example(encoded_jpg,labelInfo,imageInfo) trainFileWriter.write(tf_example.SerializeToString()) trainFileWriter.close()
Use Python's XML library to read label files and fill them in dictionary files (that is, the previous ___________ Place).
DOMTree = xml.dom.minidom.parse(trainDirString + '/' + fileName[0:len(fileName)-4] + '.xml') collection = DOMTree.documentElement objects = collection.getElementsByTagName("object") labelInfo = { 'name':bytes(fileName, encoding = "utf8"), 'xmins':[], 'xmaxs':[], 'ymins':[], 'ymaxs':[], 'classes_text':[], 'classes':[] } textDict = { 'Car':1, '10T':2, 'BanRight':3, 'Pedestrian':4, 'TurnLeft':5, 'Speaker':6, 'Crosswalk':7, 'TurnAround':8, 'GoStraight':9, 'GoStraightOrRight':10, 'GoStraightOrLeft':11, 'BanLeft':12, 'TurnRight':13, 'BanSpeaker':14, 'NoParking':15, 'BanTurnAround':16, 'BanCar':17, 'BanStraightAndLeft':18, 'SlowDown':19, 'BanLeftAndRight':20, 'Limit40':21, 'BanStraightAndRight':22 } for object in objects: labelInfo['xmins'].append(float(object.getElementsByTagName('xmin')[0].childNodes[0].data)/imageInfo['width']) labelInfo['xmaxs'].append(float(object.getElementsByTagName('xmax')[0].childNodes[0].data)/imageInfo['width']) labelInfo['ymins'].append(float(object.getElementsByTagName('ymin')[0].childNodes[0].data)/imageInfo['height']) labelInfo['ymaxs'].append(float(object.getElementsByTagName('ymax')[0].childNodes[0].data)/imageInfo['height']) labelInfo['classes_text'].append(bytes(object.getElementsByTagName('name')[0].childNodes[0].data, encoding = "utf8")) labelInfo['classes'].append(textDict[object.getElementsByTagName('name')[0].childNodes[0].data]) print(labelInfo) img_path = trainDirString + '/' + fileName
If you use your own data set, you need to number the object number in your data set and modify the textDict part. The structure of the object is as follows.
{ Object Name 1: 1, // Object ID, Number Type, Numbered from 1 "Object Name 2": 2, // Object ID, Number type, starting from 1, //... }
The complete sample code is shown below. appendix . Execute the program to find train.record in the data directory.
Now you can look at the XML file generated by LabelImg, which contains a description of the size of the image and a description of the label.
parameter | Meaning |
---|---|
width | Represents the width of the picture |
height | Represents the height of a picture |
filename | Represents the name of the image, and each object is represented by an object tag. |
name | Object name |
xmin | Represents the abscissa value of the left border of the object label box in the picture. |
xmax | Represents the abscissa value of the border of the object label box in the picture. |
ymin | Represents the ordinate value of the upper border of the object label box in the picture. |
ymax | Represents the ordinate value of the lower border of the object label box in the picture. |
<annotation> <folder>TrafficSign</folder> <filename>1521536642525.jpg</filename> <path>D:\Project\Python_Projects\ObjectDetectionTraining\data\TrafficSign\1521536642525.jpg</path> <source> <database>Unknown</database> </source> <size> <width>480</width> <height>480</height> <depth>3</depth> </size> <segmented>0</segmented> <object> <name>Car</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>112</xmin> <ymin>188</ymin> <xmax>185</xmax> <ymax>262</ymax> </bndbox> </object> <object> <name>10T</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>380</xmin> <ymin>189</ymin> <xmax>457</xmax> <ymax>265</ymax> </bndbox> </object> </annotation>
4.4 Selection Model
There are many models for object recognition, and according to these models, Google reproduces them with TensorFlow, and opens the pre-training model to developers. Developers can use object recognition technology after retraining these models with their own training set. In this experiment, we choose ssd_mobilenet_v1_coco model.
Codelab A mirror image of ssd_mobilenet_v1_coco is provided in the Tools/model directory. A new directory named model is created in the Codelab directory. The directory is switched to the Terminal interface, entered into the Codelab directory, and copied to the directory.
cp -r ../Tools/model/* model
4.5 Training Model
The training profile pipeline.config comes with the model directory and is edited with jupyter.
The parameters after modifying num_classes are the total number of objects in this training. (If you use the data set provided, change it to 22)
The path after modifying fine_tune_checkpoint is / home/jovyan/Codelab/model/model.ckpt
After modifying train_input_reader/input_path, the parameter is / home/jovyan/Codelab/data/train.
The parameter after modifying eval_input_reader/input_path is / home/jovyan/Codelab/data/train.
For convenience, the training set is used directly as the test set.
The parameter after modifying label_map_path is / home/jovyan/Codelab/data/label.pbtxt
After saving and exiting the edit, enter the project directory with Terminal and the / home/jovyan/Appendix/tensorflow_models/research directory.
cd ~/Appendix/tensorflow_models/research
Implement the following training procedures under this directory. By default, the program will be trained 50,000 steps after the end, you can change the number of executions by modifying the NUM_TRAIN_STEPS parameters.
PIPELINE_CONFIG_PATH='/home/jovyan/Codelab/model/pipeline.config' MODEL_DIR='/home/jovyan/Codelab/data' NUM_TRAIN_STEPS=50000 SAMPLE_1_OF_N_EVAL_EXAMPLES=1 python3 object_detection/model_main.py \ --pipeline_config_path=${PIPELINE_CONFIG_PATH} \ --model_dir=${MODEL_DIR} \ --num_train_steps=${NUM_TRAIN_STEPS} \ --sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES \ --alsologtostderr
The latest version of the TensorFlow Object Detection project deletes the loss output for each step, but after the program starts executing, you can view the training through tensorboard. By default, the program will be trained 50,000 steps after the end, you can change the number of executions by modifying the NUM_TRAIN_STEPS parameters.
4.6 Prediction Experiments
By default, the program will train 50,000 steps before the end of the training, but during the training process will output the checkpoint of the intermediate training, you can see the intermediate results of the training in advance without waiting for the complete end of the training.
Before starting to write the prediction program, the checkpoint of the intermediate training is transformed into a callable tensorflow graph file. The parameters of TRAINED_CKPT_PREFIX need to be filled in the path of the generated CKPT file during the above training process, such as TRAINED_CKPT_PREFIX='/home/jovyan/Codelab/data/model.ckpt-12345'. The file code is as follows:
INPUT_TYPE=image_tensor PIPELINE_CONFIG_PATH='/home/jovyan/Codelab/model/pipeline.config' TRAINED_CKPT_PREFIX='/home/jovyan/Codelab/data/model.ckpt-Number of steps selected' EXPORT_DIR='/home/jovyan/Codelab/output' python3 object_detection/export_inference_graph.py \ --input_type=${INPUT_TYPE} \ --pipeline_config_path=${PIPELINE_CONFIG_PATH} \ --trained_checkpoint_prefix=${TRAINED_CKPT_PREFIX} \ --output_directory=${EXPORT_DIR}
Create a new Python 3 file named Inference.py in the Codelab directory. According to the convention, add the system dependency library first.
import sys sys.path.append('/home/jovyan/Appendix/tensorflow_models/research') sys.path.append('/home/jovyan/Appendix/tensorflow_models/research/slim') sys.path.append('/home/jovyan/Appendix/tensorflow_models/research/object_detection') import numpy as np import os import tensorflow as tf from distutils.version import StrictVersion from collections import defaultdict from io import StringIO from matplotlib import pyplot as plt from PIL import Image from object_detection.utils import ops as utils_ops if StrictVersion(tf.__version__) < StrictVersion('1.9.0'): raise ImportError('Please upgrade your TensorFlow installation to v1.9.* or later!') %matplotlib inline from utils import label_map_util from utils import visualization_utils as vis_util
Define path variables to point to the corresponding files of the exported graph files and labels.
PATH_TO_FROZEN_GRAPH = 'output/frozen_inference_graph.pb' PATH_TO_LABELS = os.path.join('data', 'label.pbtxt')
Read the graph file and save it in the detection_graph variable.
detection_graph = tf.Graph() with detection_graph.as_default(): od_graph_def = tf.GraphDef() with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid: serialized_graph = fid.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name='')
Read the label file and save it in category_index file.
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
Write tool functions to read pictures and save them in numpy arrays.
def load_image_into_numpy_array(image): (im_width, im_height) = image.size return np.array(image.getdata()).reshape( (im_height, im_width, 3)).astype(np.uint8)
Set the path variable of the predicted image.
PATH_TO_TEST_IMAGES_DIR = 'test_images' TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 2) ] IMAGE_SIZE = (12, 8)
Prediction functions are written, computational graphs are executed and the original results are returned.
def run_inference_for_single_image(image, graph): with graph.as_default(): with tf.Session() as sess: ops = tf.get_default_graph().get_operations() all_tensor_names = {output.name for op in ops for output in op.outputs} tensor_dict = {} for key in [ 'num_detections', 'detection_boxes', 'detection_scores', 'detection_classes', 'detection_masks' ]: tensor_name = key + ':0' if tensor_name in all_tensor_names: tensor_dict[key] = tf.get_default_graph().get_tensor_by_name( tensor_name) if 'detection_masks' in tensor_dict: detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0]) detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0]) real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32) detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1]) detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1]) detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks( detection_masks, detection_boxes, image.shape[0], image.shape[1]) detection_masks_reframed = tf.cast( tf.greater(detection_masks_reframed, 0.5), tf.uint8) tensor_dict['detection_masks'] = tf.expand_dims( detection_masks_reframed, 0) image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0') output_dict = sess.run(tensor_dict, feed_dict={image_tensor: np.expand_dims(image, 0)}) output_dict['num_detections'] = int(output_dict['num_detections'][0]) output_dict['detection_classes'] = output_dict[ 'detection_classes'][0].astype(np.uint8) output_dict['detection_boxes'] = output_dict['detection_boxes'][0] output_dict['detection_scores'] = output_dict['detection_scores'][0] if 'detection_masks' in output_dict: output_dict['detection_masks'] = output_dict['detection_masks'][0] return output_dict
Perform a prediction for each test image and visualize it in jupyter
for image_path in TEST_IMAGE_PATHS: image = Image.open(image_path) image_np = load_image_into_numpy_array(image) image_np_expanded = np.expand_dims(image_np, axis=0) output_dict = run_inference_for_single_image(image_np, detection_graph) vis_util.visualize_boxes_and_labels_on_image_array( image_np, output_dict['detection_boxes'], output_dict['detection_classes'], output_dict['detection_scores'], category_index, instance_masks=output_dict.get('detection_masks'), use_normalized_coordinates=True, line_thickness=8) plt.figure(figsize=IMAGE_SIZE) plt.imshow(image_np)
Create a new directory named test_images in the Codelab directory, place the images to be tested in that directory and name them as image1.jpg, image2.jpg, image3.jpg (and so on).
Switch to Inference.py's Python 3 file and modify the program as follows.
# Find this code
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(1, 2) ]
# Modify to this code
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR,'image{}.jpg'.format(i)) for i in range(1, Replace with the number of test pictures+1) ]
Executing the program, you can see the predicted results under the current training level in the jupyter interface.
Five, appendix
#file_name: Image2TFRecord.py import sys sys.path.append('/home/jovyan/Appendix/tensorflow_models/research') sys.path.append('/home/jovyan/Appendix/tensorflow_models/research/slim') import numpy as np import tensorflow as tf import os import xml.dom.minidom from object_detection.utils import dataset_util imageInfo = { 'height':480, 'width':480 } def create_tf_example(image,labelInfo,imageInfo): height = imageInfo['height'] # Image height width = imageInfo['width'] # Image width filename = labelInfo['name'] # Filename of the image. Empty if image is not from file encoded_image_data = image # Encoded image bytes image_format = b'jpeg' # b'jpeg' or b'png' xmins = labelInfo['xmins'] # List of normalized left x coordinates in bounding box (1 per box) xmaxs = labelInfo['xmaxs'] # List of normalized right x coordinates in bounding box # (1 per box) ymins = labelInfo['ymins'] # List of normalized top y coordinates in bounding box (1 per box) ymaxs = labelInfo['ymaxs'] # List of normalized bottom y coordinates in bounding box # (1 per box) classes_text = labelInfo['classes_text'] # List of string class name of bounding box (1 per box) classes = labelInfo['classes'] # List of integer class id of bounding box (1 per box) tf_example = tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature(filename), 'image/source_id': dataset_util.bytes_feature(filename), 'image/encoded': dataset_util.bytes_feature(encoded_image_data), 'image/format': dataset_util.bytes_feature(image_format), 'image/object/bbox/xmin': dataset_util.float_list_feature(xmins), 'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs), 'image/object/bbox/ymin': dataset_util.float_list_feature(ymins), 'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs), 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 'image/object/class/label': dataset_util.int64_list_feature(classes), })) return tf_example trainFileWriter = tf.python_io.TFRecordWriter('data/train.record') trainDirString = 'data/TrafficSign' trainDir = os.listdir(trainDirString) for fileName in trainDir: if(fileName.endswith('.jpg')): DOMTree = xml.dom.minidom.parse(trainDirString + '/' + fileName[0:len(fileName)-4] + '.xml') collection = DOMTree.documentElement objects = collection.getElementsByTagName("object") labelInfo = { 'name':bytes(fileName, encoding = "utf8"), 'xmins':[], 'xmaxs':[], 'ymins':[], 'ymaxs':[], 'classes_text':[], 'classes':[] } textDict = { 'Car':1, '10T':2, 'BanRight':3, 'Pedestrian':4, 'TurnLeft':5, 'Speaker':6, 'Crosswalk':7, 'TurnAround':8, 'GoStraight':9, 'GoStraightOrRight':10, 'GoStraightOrLeft':11, 'BanLeft':12, 'TurnRight':13, 'BanSpeaker':14, 'NoParking':15, 'BanTurnAround':16, 'BanCar':17, 'BanStraightAndLeft':18, 'SlowDown':19, 'BanLeftAndRight':20, 'Limit40':21, 'BanStraightAndRight':22 } for object in objects: labelInfo['xmins'].append(float(object.getElementsByTagName('xmin')[0].childNodes[0].data)/imageInfo['width']) labelInfo['xmaxs'].append(float(object.getElementsByTagName('xmax')[0].childNodes[0].data)/imageInfo['width']) labelInfo['ymins'].append(float(object.getElementsByTagName('ymin')[0].childNodes[0].data)/imageInfo['height']) labelInfo['ymaxs'].append(float(object.getElementsByTagName('ymax')[0].childNodes[0].data)/imageInfo['height']) labelInfo['classes_text'].append(bytes(object.getElementsByTagName('name')[0].childNodes[0].data, encoding = "utf8")) labelInfo['classes'].append(textDict[object.getElementsByTagName('name')[0].childNodes[0].data]) print(labelInfo) img_path = trainDirString + '/' + fileName with tf.gfile.GFile(img_path, 'rb') as fid: encoded_jpg = fid.read() tf_example = create_tf_example(encoded_jpg,labelInfo,imageInfo) trainFileWriter.write(tf_example.SerializeToString()) trainFileWriter.close()