Skip to main content
The Data acquisition page is where you collect, import, explore, and label the data used to train your models. It provides tools to manage your dataset, build data pipelines, generate new data, and improve data quality throughout the machine learning lifecycle. Each tab focuses on a specific part of the workflow and is described below.
Not all tabs always shownThe tabs shown on the data acquisition page depend on the specifics of your projects. For example, the type of data you are working with, whether it is an object detection project or not, and the like. If you don’t see a tab described below, it may not be relevant to your project.

Dataset

The Dataset tab is the central view of all data in your project. It shows how your data is split between training and testing sets, visualizes label distribution, and allows you to upload existing data or collect new data using devices connected to your project.

Data explorer

The Data explorer provides a visual overview of your dataset to help identify clusters, outliers, and mislabeled samples. It projects your data into a 2D space using feature extraction and dimensionality reduction, enabling quick exploration and assisted labeling.

Data sources

The Data sources tab lets you connect external storage and build automated data pipelines. You can import datasets, schedule ingestion, run transformations, and trigger workflows such as labeling, retraining, and deployment.

Synthetic data

The Synthetic data tab allows you to generate and augment datasets using built-in integrations. This helps fill gaps in your data, improve model performance, and reduce the need for manual data collection.

Labeling queue

The Labeling queue streamlines annotation workflows for object detection projects. It presents unlabeled data and provides AI-assisted tools, such as model-based suggestions and object tracking, to speed up bounding box labeling.

AI labeling

The AI labeling tab allows you to automatically label datasets by integrating AI models into your workflow. You can use pre-built labeling blocks or create custom ones to apply your own models, helping scale labeling efforts and improve consistency across your data.

CSV Wizard

The CSV Wizard simplifies importing large or complex datasets by generating a reusable configuration that maps your data format into Edge Impulse’s ingestion format, supporting CSV, TXT, and Parquet files.