Data acquisition
Last updated
Last updated
All collected data for each project can be viewed on the Data acquisition tab. You can see how your data has been split for train/test set as well as the data distribution for each class in your dataset. You can also send new sensor data to your project either by file upload, WebUSB, Edge Impulse API, or Edge Impulse CLI.
The panel on the right allows you to collect data directly from any fully supported platform:
Through WebUSB.
Using the Edge Impulse CLI daemon.
From the Edge Impulse for Linux CLI.
The WebUSB and the Edge Impulse daemon work with any fully supported device by flashing the pre-built Edge Impulse firmware to your board. See the list of fully supported boards.
When using the Edge Impulse for Linux CLI, run edge-impulse-linux --clean
and it will add your platform to the device list of your project. You will then will be able to interact with it from the Collect data panel.
If your device is not in the officially supported list, you can also collect data using the CLI data forwarder by directly writing the sensor values over a serial connection. The "data forwarder" then signs the data and sends it to the ingestion service.
Edge Impulse also supports different data sample formats and dataset annotation formats (Pascal VOC, YOLO TXT, COCO JSON, Edge Impulse Object Detection, OpenImage CSV) that you can import into your project to build your edge AI models:
Upload portals (Enterprise feature).
The train/test split is a technique for training and evaluating the performance of a machine learning algorithms. It indicates how your data is split between training and testing samples. For example, an 80/20 split indicates that 80% of the dataset is used for model training purposes while 20% is used for model testing.
This section also shows how your data samples in each class are distributed to prevent imbalanced datasets which might introduce bias during model training.
Manually navigating to some categories of data can be time-consuming, especially when dealing with a large dataset. The data acquisition filter enables the user to filter data samples based on some criteria of choice. This can be based on:
Label - class to which a sample represents.
Sample name - unique ID representing a sample.
Signature validity
Enabled and disabled samples
Length of sample - duration of a sample.
The filtered samples can then be manipulated by editing labels, deleting, and moving from the training set to the testing set (and vice versa), a shown in the image above.
The data manipulations above can also be applied at the data sample level just by simply navigating to the individual data sample then clicking ⋮ and selecting the type of action you might want to perform to the specific sample. This might be renaming , editing its label, disabling, cropping, splitting, downloading, and even deleting the sample when desired.
To crop a data sample, go to the sample you want to crop and click ⋮, then select Crop sample. You can specific a length, or drag the handles to resize the window, then move the window around to make your selection.
Made a wrong crop? No problem, just click Crop sample again and you can move your selection around. To undo the crop, just set the sample length to a high number, and the whole sample will be selected again.
Besides cropping you can also split data automatically. Here you can perform one motion repeatedly, or say a keyword over and over again, and the events are detected and can be stored as individual samples. This makes it easy to very quickly build a high-quality dataset of discrete events. To do so head to Data Acquisition, record some new data, click, and select Split sample. You can set the window length, and all events are automatically detected. If you're splitting audio data you can also listen to events by clicking on the window, the audio player is automatically populated with that specific split.
Samples are automatically centered in the window, which might lead to problems on some models (the neural network could learn a shortcut where data in the middle of the window is always associated with a certain label), so you can select "Shift samples" to automatically move the data a little bit around.
Splitting data is - like cropping data - non-destructive. If you're not happy with a split just click Crop sample and you can move the selection around easily.
The labeling queue will only appear on your data acquisition page if you are dealing with object detection tasks. The labeling queue shows a list of images that have been staged for annotation for your project.
If you are not dealing with an object detection task, you can simply disable the labeling queue bar by going to Dashboard > Project info > Labeling method and clicking the dropdown and selecting "one label per data item" as shown in the image below.
For more information about the labeling queue and how to perform data annotation using AI-assisted labeling on Edge Impulse, you can have a look at our documentation here.