7 Mins Read

Underwater trash Detection with Deep Learning

Underwater trash Detection with Deep Learning

Underwater trash Detection with Deep Learning

Every year, an increasing amount of waste, mostly plastics, finds its way into the ocean, endangering marine life and changing ocean ecosystems. In addition to preventing trash from entering the water, another way to reduce water pollution is to collect the existing trash.

Since trash in the water sinks into the depths over time  it is difficult for humans to collect them. We can do this faster and more accurately by training the robots with deep learning methods. In this article, we want to train a deep learning model to identify and segment five categories of objects found underwater: trash bottles, trash bags, trash pipes, ROVs (Remotely operated underwater vehicle), and animal fish.

What is the problem with underwater trash?

Sea animals such as fish, turtles, seabirds, and seals may get stuck in the underwater trash, which restricts their movement and even causes wounds on their bodies. It is also possible that they confuse the  trash with food and eat  it, leading to illness or damage to their internal organs or even resulting in their death.

It should be considered that these problems are not just threatening the aquatic world; other animals and humans who consume seafood are also prone to sickness due to polluted seafood.

Undoubtedly, the issues mentioned above will only get worse unless we take fast and effective actions to solve the problem.

Underwater trash

The first step: Dataset

The first step for training deep learning models is collecting data or using an available dataset. This article uses the “TrashCan 1.0: An Instance-Segmentation Labeled Dataset of Trash Observations”. This dataset was curated and published in 2020 by the Japan Agency of Marine-Earth Science and Technology (JAMSTEC). The TrashCan dataset has 7,212 annotated images and contains 22 classes, including a wide variety of undersea flora and fauna, ROV, and trash. However, for simplicity, we will only use five classes in this project.

TrashCan dataset
Figure 1. sample image from TrashCan dataset (animal fish, trash pipe, trash bottle, trash bag, ROV)

Since we are using a public dataset we can go ahead and upload the dataset to the AIEX platform.  If we were to start our own dataset, we could use the platform’s search engine to find images and annotate them.

The second step: Training

After uploading the dataset (this article explains how to upload the dataset), we should verify all annotations and then set the state of all images to “ready to train”. Now we click on “Build new version” to split images into the train, validate and test datasets.

Trashcan-AIEX platform
Figure 2. AIEX platform, after uploading a dataset and verifying annotations, set all images to the Ready to train state. Split dataset with build new version button

Now we need to select a model and train it to identify segment trash in images. In the AIEX platform, we can select segmentation models from several frameworks, including PyTorch, TensorFlow, and TAO.

Table 1. Training results on the validation dataset.

Training results

The last step: Deploy

The inference process can determine how well the model has been trained. Here are some of our results.

Inference result
Figure 3. inference result on (1) pytorch u2net (2) pytorch detectron2 (3) tensorflow mask RCNN (4) tao unet


The underwater trash data set is hard to work with, because underwater images do not have good quality, and more iterations are necessary to achieve high accuracy. Different values should be tested for hyperparameters such as LR, Momentum, and Weight Decay. Nonetheless, we obtained 80% accuracy in the Pytorch u2net model, which performed better than other algorithms designed for this use case.

Unet and U2net models are saliency models, and Classes cannot be distinguished from one another in saliency models. Basically the model identifies the most important or the main object in an image. In other words, we are unable to determine the confidence score for each class, unlike RCNN models which are not saliency models and can output class name and confidence score for each class.

Therefore it can’t be said that u2net was able to achieve the best outcome in this project.

It’s better to compare the PyTorch U2NET and TAO UNET models. PyTorch U2NET was able to achieve better results compared to TAO UNET in this test. We should also compare the PyTorch detectron2 and TensorFlow mask RCNN models together.

In future articles, we’ll explain more about saliency and other kinds of models.

Related articles
waste management
This article discusses the integration of artificial intelligence in the field of recycling. We will also train a model...
traffic management
As urban areas continue to grow, the number of vehicles on the road is increasing, which leads to congested...
Thanks to advancements in artificial intelligence, particularly in computer vision, deep learning models can now be utilized in various...
Trauma Detection on Pelvic Radiographs using Computer Vision Algorithms
A timely and accurate diagnosis of the proximal femur and pelvis injuries in trauma patients is essential to saving...
Detecting and classifying aesthetic defects in different sizes, shapes, and positions immediately after the casting process is an essential...
X-ray Images
Nowadays with the help of computer vision technology and image processing we can classify broken and normal bone X-ray...
Subscribe to our newsletter and get the latest practical content.

You can enter your email address and subscribe to our newsletter and get the latest practical content. You can enter your email address and subscribe to our newsletter.