3 Mins Read

How Object Detection Works

object detection

How Object Detection Works

Object detection is focused on identifying objects of interest in digital images using a computer vision model. Self-driving cars, biometric authentication, and other advanced technologies are all largely based on object detection algorithms. Here we have a picture of a dog and a cat, with white backgrounds. The object detection method places a box around each object and labels them.

In order to draw a box around the object, we need four points for four corners of the box.  The Regression Algorithm takes care of these calculations and the Classification Algorithm determines what the inside object is.

object detection algorithm
Figure1. Overview of an object detection algorithm

One-Stage and Two-Stage

The model described above works well as long as there is one object in the image. But a practical algorithm should be able to detect unlimited objects in any image. Therefore, our algorithm must be able to detect all objects and draw multiple boxes with different dimensions (also known as windows) around them. There are two general methods to determine the right window size. Both of these methods use different classes of object detection algorithms.

The first type of these algorithms first uses classical CV methods to scan the image and select regions with a high probability of an object present. Then an object detection algorithm consisting of several regression and classification models labels the object inside selected regions. These types of algorithms are categorized under the “two-stage” method; they are often more accurate but slower than one-stage methods.

The second  type of object detection algorithms examines and detects objects only in specific places and sizes. These locations and sizes are strategically chosen to cover as many scenarios as possible. The algorithms in this group usually divide an image into several sections of a specific size and assume a certain number of objects in each area, each of which has a predetermined shape and size. Algorithms in this category are called “one-stage” methods, examples include YOLO, SSD, and RetinaNet. In general, one-=stage algorithms are faster but less accurate.

The AIEX platform provides a variety of object detection algorithms in several frameworks, which can easily be used without any coding knowledge.

Table1: The AIEX object detection algorithms 

AIEX object detection algorithm

Let’s take a closer look at one important algorithm from each group.

The Family of R-CNN Models

R-CNN, or “Regions with CNN Features”, is a group of object detection algorithms Three most important algorithms in this family includes: R-CNN, Fast-RCNN, and Faster-RCNN. A 2014 paper by Ross Girshick, et al. from UC Berkeley describes the R-CNN algorithm as “Rich feature hierarchies for accurate object detection and semantic segmentation.” In light of R-CNN’s great success, Ross Girshick proposed a faster version in a 2015 paper, called “Fast R-CNN.” The model’s training and detection speed, and model architecture were further improved by Shaoqing Ren, et al., in their 2016 paper, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.”. Our white papers will provide more details about these algorithms, their architecture, and their implementations.

Model designs for R-CNN
Figure2. Model designs for R-CNN, Fast R-CNN, and Faster R-CNN. Source

The Family of YOLO Models

“You Only Look Once” is another popular object detection model developed by Joseph Redmon et al. In 2016, Joseph Redmon and Ali Farhadi updated the model to further improve its performance in their paper “YOLO9000: Better, Faster, Stronger.” In their 2018 paper titled “YOLOv3: An Incremental Improvement,” Joseph Redmon and Ali Farhadi proposed improvements to the model, including a deeper feature detector network and minor representational changes. The YOLO algorithm is maintained by a team who have now released YOLOV7, but the original authors are not involved with it anymore. Our white papers will provide more details about these algorithms, their architectures, and their implementations.

Related articles
To train a model or use transfer learning in machine vision, there must be enough data. Data Augmentation is...
Train, Test, and Validation Datasets
An artificial intelligence model output is affected by how we divide the input dataset. There are several factors to...
Data-Driven approach
An AI model’s performance can be increased by either improving the dataset or the model’s structure. The purpose of...
In this article, we will introduce Tensorboard and explain how it can be used on AIEX....
The majority of machine learning algorithms work by minimizing or maximizing an 'objective function'. Loss Functions are a group...
Backbone is a network that extracts a feature map of the input image , the map is then utilized...
Subscribe to our newsletter and get the latest practical content.

You can enter your email address and subscribe to our newsletter and get the latest practical content. You can enter your email address and subscribe to our newsletter.