Using k-means clustering to generate SSD anchor frame aspect ratio

(this article is part of TensorFlow Object_detection API framework, which is used to train your own model anchor box.)
Many object detection models use anchor boxes as the area sampling strategy, so during training, the model learns to match one of several predefined anchor boxes with the ground live boundary box. In order to optimize the accuracy and efficiency of the object detection model, it is helpful to adjust these anchor boxes to suit your model data set, because TensorFlow's trained checkpoints come with configuration files that include aspect ratios designed to cover a very wide range of object sets
So in this article, learn how to find a set of aspect ratios customized for your dataset, as found by k-means clustering of all real bounding box ratios.
For demonstration purposes, we use a subset of the PETS dataset (cat and dog), which matches some other model training tutorials (such as Edge TPU tutorials), but you can use this script with different datasets, and we will show how to adjust it to meet the objectives of the model, This includes how to optimize speed rather than accuracy or accuracy rather than speed.
The result of this notebook is a new pipeline config file, which you can copy into the model training script. With the new custom anchor box configuration, you should observe faster training channels and slightly improved model accuracy.

Install required libraries

# Install the tensorflow Object Detection API...
# If you're running this offline, you also might need to install the protobuf-compiler:
#   apt-get install protobuf-compiler

! git clone -n
%cd models
!git checkout 461b3587ef38b42cda151fa3b7d37706d77e4244
%cd research
! protoc object_detection/protos/*.proto --python_out=.

# Install TensorFlow Object Detection API
%cp object_detection/packages/tf2/ .
! python -m pip install --upgrade pip
! python -m pip install --use-feature=2020-resolver .

# Test the installation
! python object_detection/builders/

Prepare data

Although this notebook does not perform model training, you need to use the same dataset as the training model here.

To find the best anchor box ratio, you should use all training data sets (or as many data sets as possible). This is because, as mentioned in the introduction, you want to measure the exact kind of images you want your model to encounter - a little less, and the anchor box may not cover the various objects your model encounters, so it may have weak accuracy. (another approach, in which the ratio is based on data beyond the scope of the model, usually creates an inefficient model and its accuracy may be low.)

%mkdir /content/dataset
%cd /content/dataset
! wget
! wget
! tar zxf images.tar.gz
! tar zxf annotations.tar.gz

XML_PATH = '/content/dataset/annotations/xmls'

Because the following k-means script will handle all XML annotations, we want to reduce the PETS dataset to include only cats and dogs for the training model (in this training notebook). So we delete all comment files that are not Abyssinia or American Bulldogs

! (cd /content/dataset/annotations/xmls/ && \
  find . ! \( -name 'Abyssinian*' -o -name 'american_bulldog*' \) -type f -exec rm -f {} \; )

Upload your own data

To generate an anchor box ratio for your own dataset, upload the ZIP file with the comment file (click the file tab on the left and drag and drop the ZIP file there), then uncomment the following code to unzip it and specify the path to the directory where the comment file is located:

# %cd /content/
# !unzip

# XML_PATH = '/content/dataset/annotations/xmls'

Use k-means to find the best anchor box ratio

We try to find a set of aspect ratios that overlap most object shapes in the dataset. We do this by finding the common clusters of the bounding box of the data set, and use the k-means clustering algorithm to find the centroids of these clusters.

To solve this problem, we need to calculate the following:

  • The k-means clustering centroid of a given bounding box (see the kmeans_aspect_ratios() function below).
  • The intersection of a given bounding box with an average aspect ratio. (see the average_iou() function below). This does not affect the result of the final box ratio, but can be used as a useful indicator to determine whether the selected box is valid and whether to try more / less aspect ratios. (we will discuss this score in detail below.)

Note: the term "centroid" used here refers to the center of the k-means cluster (box (height, width) vector).

import sys
import os
import numpy as np
import xml.etree.ElementTree as ET

from sklearn.cluster import KMeans

def xml_to_boxes(path, rescale_width=None, rescale_height=None):
  """Extracts bounding-box widths and heights from ground-truth dataset.

  path : Path to .xml annotation files for your dataset.
  rescale_width : Scaling factor to rescale width of bounding box.
  rescale_height : Scaling factor to rescale height of bounding box.

  bboxes : A numpy array with pairs of box dimensions as [width, height].

  xml_list = []
  filenames = os.listdir(os.path.join(path))
  filenames = [os.path.join(path, f) for f in filenames if (f.endswith('.xml'))]
  for xml_file in filenames:
    tree = ET.parse(xml_file)
    root = tree.getroot()
    for member in root.findall('object'):
      bndbox = member.find('bndbox')
      bbox_width = int(bndbox.find('xmax').text) - int(bndbox.find('xmin').text)
      bbox_height = int(bndbox.find('ymax').text) - int(bndbox.find('ymin').text)
      if rescale_width and rescale_height:
        size = root.find('size')
        bbox_width = bbox_width * (rescale_width / int(size.find('width').text))
        bbox_height = bbox_height * (rescale_height / int(size.find('height').text))
      xml_list.append([bbox_width, bbox_height])
  bboxes = np.array(xml_list)
  return bboxes

def average_iou(bboxes, anchors):
    """Calculates the Intersection over Union (IoU) between bounding boxes and

    bboxes : Array of bounding boxes in [width, height] format.
    anchors : Array of aspect ratios [n, 2] format.

    avg_iou_perc : A Float value, average of IOU scores from each aspect ratio
    intersection_width = np.minimum(anchors[:, [0]], bboxes[:, 0]).T
    intersection_height = np.minimum(anchors[:, [1]], bboxes[:, 1]).T

    if np.any(intersection_width == 0) or np.any(intersection_height == 0):
        raise ValueError("Some boxes have zero size.")

    intersection_area = intersection_width * intersection_height
    boxes_area =, axis=1, keepdims=True)
    anchors_area =, axis=1, keepdims=True).T
    union_area = boxes_area + anchors_area - intersection_area
    avg_iou_perc = np.mean(np.max(intersection_area / union_area, axis=1)) * 100

    return avg_iou_perc

def kmeans_aspect_ratios(bboxes, kmeans_max_iter, num_aspect_ratios):
  """Calculate the centroid of bounding boxes clusters using Kmeans algorithm.

  bboxes : Array of bounding boxes in [width, height] format.
  kmeans_max_iter : Maximum number of iterations to find centroids.
  num_aspect_ratios : Number of centroids to optimize kmeans.

  aspect_ratios : Centroids of cluster (optmised for dataset).
  avg_iou_prec : Average score of bboxes intersecting with new aspect ratios.

  assert len(bboxes), "You must provide bounding boxes"

  normalized_bboxes = bboxes / np.sqrt(, keepdims=True))
  # Using kmeans to find centroids of the width/height clusters
  kmeans = KMeans(
      init='random', n_clusters=num_aspect_ratios, random_state=0, max_iter=kmeans_max_iter)
  ar = kmeans.cluster_centers_

  assert len(ar), "Unable to find k-means centroid, try increasing kmeans_max_iter."

  avg_iou_perc = average_iou(normalized_bboxes, ar)

  if not np.isfinite(avg_iou_perc):
    sys.exit("Failed to get aspect ratios due to numerical errors in k-means")

  aspect_ratios = [w/h for w,h in ar]

  return aspect_ratios, avg_iou_perc

In the next code block, we will call the above function to find the ideal anchor box aspect ratio.

You can adjust the following parameters to suit your performance goals.

Most importantly, you should consider the number of aspect ratios to generate. At both ends of the decision-making horizon, you may seek two goals:

  1. Low accuracy and high reasoning speed, using 2 ~ 3 aspect ratios
    The accuracy or confidence score of the procedure is about 80%.
    The average IOU score (from avg_iou_perc) will be around 70-85.
    This reduces the overall calculation of the model in the reasoning process, so as to make the reasoning speed faster.

  2. High accuracy and low reasoning speed, using 5 ~ 6 aspect ratios
    The accuracy or confidence score of the procedure is about 90%.
    The average IOU score (from avg_iou_perc) exceeds 95
    This increases the overall calculation of the model in the reasoning process, which makes the reasoning speed slower.

The following initial configuration is in between: it searches for 4 aspect ratios.

# Tune this based on your accuracy/speed goals as described above
num_aspect_ratios = 4 # can be [2,3,4,5,6]

# Tune the iterations based on the size and distribution of your dataset
# You can check avg_iou_prec every 100 iterations to see how centroids converge
kmeans_max_iter = 500

# These should match the training pipeline config ('fixed_shape_resizer' param)
width = 320
height = 320

# Get the ground-truth bounding boxes for our dataset
bboxes = xml_to_boxes(path=XML_PATH, rescale_width=width, rescale_height=height)

aspect_ratios, avg_iou_perc =  kmeans_aspect_ratios(

aspect_ratios = sorted(aspect_ratios)

print('Aspect ratios generated:', [round(ar,2) for ar in aspect_ratios])
print('Average IOU with anchors:', avg_iou_perc)

Generate a new profile

Now we only need what we used at the beginning of the model config file, we will add the new SSD_ anchor_ Merge the generator attribute into it.

import tensorflow as tf
from google.protobuf import text_format
from object_detection.protos import pipeline_pb2

pipeline = pipeline_pb2.TrainEvalPipelineConfig()
config_path = '/content/models/research/object_detection/samples/configs/ssdlite_mobiledet_edgetpu_320x320_coco_sync_4x4.config'
pipeline_save = '/content/ssdlite_mobiledet_edgetpu_320x320_custom_aspect_ratios.config'
with, "r") as f:
    proto_str =
    text_format.Merge(proto_str, pipeline)
pipeline.model.ssd.num_classes = 2
while pipeline.model.ssd.anchor_generator.ssd_anchor_generator.aspect_ratios:

for i in range(len(aspect_ratios)):

config_text = text_format.MessageToString(pipeline)
with, "wb") as f:
# Check for updated aspect ratios in the config
!cat /content/ssdlite_mobiledet_edgetpu_320x320_custom_aspect_ratios.config

Summary and next steps

If you look at the new printed above config file, you will find anchor_generator specification, including the new aspect we generated using the k-means code above_ Ratio value.

The original configuration file (ssdlite_mobileet_edgetpu_320x320_coco_sync_4x4.config) does already have some default anchor box aspect ratios, but we have replaced these values with values optimized for our dataset. The accuracy of these anchor boxes should be improved compared with the new anchor box.

If you want to use this configuration to train your model, please see retrain MobileDet for the Coral Edge TPU , it uses this precise cat / dog data set. Just copy the one printed above Config file and add it to the notebook. (or download the file from the file panel on the left side of the Colab UI: it is called ssdlite_mobileet_edgetpu_320x320_custom_aspect_ratios.config.)

For more information about pipe profiles, read Configuring the Object Detection Training Pipeline.

About anchor scales

This notebook focused on the aspect ratio of the anchor box, which is usually the most difficult to adjust for each dataset. However, you should also consider different configurations of anchor frame scale, which specify the number of different anchor frame sizes and their minimum / maximum sizes - which will affect the ability of your model to detect objects of different sizes.

Manually adjusting the anchor scale is easier by estimating the minimum / maximum size you want the model to encounter in the application environment. Just as when selecting the number of aspect ratios above, the number of different box sizes will also affect the accuracy and speed of your model (using more boxes is more accurate, but also slower).

You can also Configuring the Object Detection Training Pipeline Read more about anchor proportions.

Keywords: TensorFlow Deep Learning kmeans

Added by patryn on Tue, 15 Feb 2022 08:34:50 +0200