Data acquisition
Last updated
Last updated
All collected data for each project can be viewed on the Data acquisition tab. You can see how your data has been split for train/test set as well as the data distribution for each class in your dataset. You can also send new sensor data to your project either by file upload, WebUSB, Edge Impulse API, or Edge Impulse CLI.
Organization data
Since the creation of Edge Impulse, we have been helping our customers deal with complex data pipelines, complex data transformation methods and complex clinical validation studies.
The organizational data gives you tools to centralize, validate and transform datasets so they can be easily imported into your projects.
See the Organization data documentation.
The panel on the right allows you to collect data directly from any fully supported platform:
Through WebUSB.
Using the Edge Impulse CLI daemon.
From the Edge Impulse for Linux CLI.
The WebUSB and the Edge Impulse daemon work with any fully supported device by flashing the pre-built Edge Impulse firmware to your board. See the list of fully supported boards.
When using the Edge Impulse for Linux CLI, run edge-impulse-linux --clean
and it will add your platform to the device list of your project. You will then will be able to interact with it from the Collect data panel.
Need more?
If your device is not in the officially supported list, you can also collect data using the CLI data forwarder by directly writing the sensor values over a serial connection. The "data forwarder" then signs the data and sends it to the ingestion service.
Edge Impulse also supports different data sample formats and dataset annotation formats (Pascal VOC, YOLO TXT, COCO JSON, Edge Impulse Object Detection, OpenImage CSV) that you can import into your project to build your edge AI models:
Upload portals (Enterprise feature).
Multi-label time-series data
In December 2023, we released the multi-label feature. See the dedicated Multi-label page to understand how to import multi-label data samples.
For time-series data samples (including audio), you can visualize the time-series graphs on the right panel with a dark-blue background:
If you are dealing with multi-label data samples. Here is the corresponding preview:
Preview the values of tabular non-time-series & pre-processed data samples:
Raw images can be directly visualized from the preview:
For object detection projects, we can overlay the corresponding bounding boxes:
Raw videos (.mp4) can be directly visualized from the preview. Please note that you will need to split the videos into frames as we do not support training on videos files:
List view
The train/test split is a technique for training and evaluating the performance of machine learning algorithms. It indicates how your data is split between training and testing samples. For example, an 80/20 split indicates that 80% of the dataset is used for model training purposes while 20% is used for model testing.
This section also shows how your data samples in each class are distributed to prevent imbalanced datasets which might introduce bias during model training.
Manually navigating to some categories of data can be time-consuming, especially when dealing with a large dataset. The data acquisition filter enables the user to filter data samples based on some criteria of choice. This can be based on:
Label - class to which a sample represents.
Sample name - unique ID representing a sample.
Signature validity
Enabled and disabled samples
Length of sample - duration of a sample.
The filtered samples can then be manipulated by editing labels, deleting, and moving from the training set to the testing set (and vice versa), a shown in the image above.
The data manipulations above can also be applied at the data sample level by simply navigating to the individual data sample by clicking on "⋮" and selecting the type of action you might want to perform on the specific sample. This might be renaming, editing its label, disabling, cropping, splitting, downloading, and even deleting the sample when desired.
Single label
Multi-label
See Data Acquisition -> Multi-label -> Edit multi-label samples for more information.
To crop a data sample, go to the sample you want to crop and click ⋮, then select Crop sample. You can specify a length, or drag the handles to resize the window, then move the window around to make your selection.
Made a wrong crop? No problem, just click Crop sample again and you can move your selection around. To undo the crop, just set the sample length to a high number, and the whole sample will be selected again.
Besides cropping you can also split data automatically. Here you can perform one motion repeatedly, or say a keyword over and over again, and the events are detected and can be stored as individual samples. This makes it easy to very quickly build a high-quality dataset of discrete events. To do so head to Data Acquisition, record some new data, click, and select Split sample. You can set the window length, and all events are automatically detected. If you're splitting audio data you can also listen to events by clicking on the window, the audio player is automatically populated with that specific split.
Samples are automatically centered in the window, which might lead to problems on some models (the neural network could learn a shortcut where data in the middle of the window is always associated with a certain label), so you can select "Shift samples" to automatically move the data a little bit around.
Splitting data is - like cropping data - non-destructive. If you're not happy with a split just click Crop sample and you can move the selection around easily.
The labeling queue and the auto-labeler will only appear on your data acquisition page if you are dealing with object detection tasks.
If you are not dealing with an object detection task, you can simply change the Labeling method configuration by going to Dashboard > Project info > Labeling method and clicking the dropdown and selecting "one label per data item" as shown in the image below.
Also, see our Label image data using GPT-4o tutorial to see how to leverage the power of LLMs to automatically label your data samples based on simple prompts.
You can change the default view (list) to a grid view to quickly overview your datasets by clicking on the icon.