6 Mins Read

Computer Vision Evaluation Metrics

evaluation metrics

Computer Vision Evaluation Metrics

Object Detection, Segmentation, and Classification are the most common applications of Computer Vision technology. In order to decide which model to use for each of these applications, how to tune the training hyperparameters, whether regularization techniques are needed, etc., you have to be equipped with the right evaluation metrics.  This article examines the different metrics used to evaluate machine vision models and the metrics implemented on the AIEX platform.

IOU (Intersection over Union)

In object detection or segmentation problems, ground truth labels are masks representing a segment or a bounding box where the object is present. The IOU metric works by comparing the bounding box of the ground truth with the bounding box of the prediction. It can be expressed as follows:

IOU = Intersection of two bounding boxes / Area of Union

IOU metric
Figure1. IOU metric


Recall is the ratio of the number of true positives to the total number of actual (relevant) objects. To put it more formally:

Recall metric
Figure2. Recall metric


Precision is defined as the percentage of instances or samples that are correctly classified among the ones classified as positives. To calculate precision, we use the following formula:

Precision metric
Figure3. Precision metric

F1 Score

An F1 score is the harmonic mean of precision and recall, expressed as follows:

F1Score metric
Figure4. F1 Score metric

Average Precision (AP)

In most cases, we use precision-recall curves to evaluate the accuracy of the detection, but the Average Precision provides numerical values, which makes it easier to compare it to other models. AP summarizes the weighted mean of precisions for each threshold with increasing recall based on the precision-recall curve. In general, the Average Precision (AP) is the area under the precision-recall curve.

precision-recall curve
Figure5. precision-recall curve. Source

Mean Average Precision (mAP)

An extension of Average precision is Mean Average Precision. The average precision is calculated for individual objects, while the mAP is calculated for the entire model. The mAP method is used to determine the percentage of correct predictions in the model. mAP@0.5 is the mAP calculated at the IOU threshold of 0.5.

 definition of mAP
Figure6. The definition of mAP. Source

The AIEX Platform Evaluation Metrics

Evaluation metrics are an essential part of evaluating algorithms. There are a variety of metrics offered by the AIEX platform to enable users to evaluate training algorithms effectively. Among them, COCO metrics are recognized by the AIEX platform as one of the most comprehensive set of evaluation metrics. To find more details about COCO metrics follow this link.

Object detectors on COCO are evaluated by these 12 metrics:

Average Precision (AP):

  • AP (AP at IoU=.50:.05:.95 (primary challenge metric))
  • APIoU=.50 (AP at IoU=.50 (PASCAL VOC metric))
  • APIoU=.75 (AP at IoU=.75 (strict metric))

AP Across Scales:

  • APsmall (AP for small objects: area < 322)
  • APmedium (AP for medium objects: 322 < area < 962)
  • APlarge (AP for large objects: area > 962)

Average Recall (AR):

  • ARmax=1 (AR given 1 detection per image)
  • ARmax=10 (AR given 10 detections per image)
  • ARmax=100 (AR given 100 detections per image)

AR Across Scales:

  • ARsmall         (AR for small objects: area < 322)
  • ARmedium    (AR for medium objects: 322 < area < 962)
  • ARlarge (AR for large objects: area > 962)

AIEX platform metrics are shown below:

Precision = P

Recall = R

macro F1 = mF1

macro accuracy= mACC

macro recall=mR

macro precision=mP

mAP IoU-0.5 =mAP@0.5

mAP IoU-0.5:0.95 = mAP @0.5:0.95

AR IoU-0.5 = AR @0.5

AR IoU-0.5:0.95 = AR @0.5:0.95

Related articles
To train a model or use transfer learning in machine vision, there must be enough data. Data Augmentation is...
Train, Test, and Validation Datasets
An artificial intelligence model output is affected by how we divide the input dataset. There are several factors to...
Data-Driven approach
An AI model’s performance can be increased by either improving the dataset or the model’s structure. The purpose of...
In this article, we will introduce Tensorboard and explain how it can be used on AIEX....
The majority of machine learning algorithms work by minimizing or maximizing an 'objective function'. Loss Functions are a group...
Backbone is a network that extracts a feature map of the input image , the map is then utilized...
Subscribe to our newsletter and get the latest practical content.

You can enter your email address and subscribe to our newsletter and get the latest practical content. You can enter your email address and subscribe to our newsletter.