3 Mins Read

Dataset Development Lifecycle

Dataset-Development-Lifecycle copy

Dataset Development Lifecycle

Those working in data analysis and machine learning have all probably needed to collect and create data at some point. Here’s a paper from Google that provides a structured framework for data collection inspired by software development concepts in a 5-step cyclical process. The stages are requirements analysis, design, implementation, testing, and maintenance.

Requirements analysis: In this stage, we determine the required data by deliberating about the intentions of the project, consulting with the stakeholders, and analyzing use cases.

Design: This is where we find out whether data requirements can be met, and if so what is the most optimal way to do it by conducting research about the subject matter and consulting the experts of the field.

Implementation: Design decisions are transformed into technologies such as software systems, annotator guidelines, and labeling platforms.

Testing: Data is evaluated and decisions about whether or not to use it are made.

Maintenance: Once collected, a dataset requires a large set of affordances, including tools, policies, and designated owners.

One noteworthy aspect of Google’s approach is its emphasis on producing artifacts at each stage. This means that a document must be prepared at each stage based on provided templates, which is considered the output of that stage. According to the paper, there are critical document types for accountable dataset development. Each one is directly analogous to documentation types produced by the Software Development Lifecycle.

Dataset Development
Figure1. Critical document types for accountable dataset development. Source

The paper mentioned three of Nissenbaum’s barriers to accountability and specific data concerns and also prepared a proposal to mitigate these barriers which can be found in the figure below.

Dataset Development
Figure2.  Three of Nissenbaum’s barriers to accountability, their specific dataset concerns, and proposals for mitigation in this paper. Source
Related articles
Object counting is a crucial task in computer vision that involves determining the number of objects in an image...
Computer vision is a critical component of self-driving cars, a hot topic in recent years. We examine this topic...
Deep Learning Electromagnetic
Artificial intelligence and deep learning have rapidly become influential technologies in various fields of science. In this article, we...
Deep fake systems have gained widespread attention in recent years due to their ability to generate convincing digital media...
The Jobs of the Future : A Look at the Jobs Threatened by Artificial Intelligence and New Jobs
The advent of artificial intelligence has been a game-changer in the tech world, with the potential to transform industries...
Smart farming and artificial intelligence
The fourth agricultural revolution is already under way with the adoption of smart farm technology such as artificial intelligence,...
Subscribe to our newsletter and get the latest practical content.

You can enter your email address and subscribe to our newsletter and get the latest practical content. You can enter your email address and subscribe to our newsletter.