All collected data for each project can be viewed on the Data acquisition tab. You can see how your data has been split for train/test set as well as the data distribution for each class in your dataset. You can also send new sensor data to your project either by file upload, WebUSB, Edge Impulse API, or Edge Impulse CLI.
Edge Impulse Studio - Data acquisition view.
The panel on the right allows you to collect data directly from any fully supported platform:
When using the Edge Impulse for Linux CLI, run
edge-impulse-linux --cleanand it will add your platform to the device list of your project. You will then will be able to interact with it from the Record new data panel.
The train/test split is a technique for training and evaluating the performance of a machine learning algorithms. It indicates how your data is split between training and testing samples. For example, an 80/20 split indicates that 80% of the dataset is used for model training purposes while 20% is used for model testing.
This section also shows how your data samples in each class are distributed to prevent imbalanced datasets which might introduce bias during model training.
Manually navigating to some categories of data can be time consuming, especially when dealing with a large dataset. The data acquisition filter enables the user to filter data samples based on some criteria of choice. This can be based on:
- Label - class to which a sample represents.
- Sample name - unique ID representing a sample.
- Signature validity
- Enabled and disabled samples
- Length of sample - duration of a sample.
The filtered samples can then be manipulated by editing label, deleting, moving from trains set to test set and vise versa a shown in the image above.
The data manipulations above can also be applied at the data sample level just by simply navigating to the individual data sample then clicking ⋮ and selecting the type of action you might want to perform to the specific sample. This might be renaming , editing its label, disabling, cropping, splitting, downloading, and even deleting the sample when desired.
To crop a data sample, go to the sample you want to crop and click ⋮, then select Crop sample. You can specific a length, or drag the handles to resize the window, then move the window around to make your selection.
Made a wrong crop? No problem, just click Crop sample again and you can move your selection around. To undo the crop, just set the sample length to a high number, and the whole sample will be selected again.
Besides cropping you can also split data automatically. Here you can perform one motion repeatedly, or say a keyword over and over again, and the events are detected and can be stored as individual samples. This makes it easy to very quickly build a high-quality dataset of discrete events. To do so head to Data acquisition, record some new data, click, and select Split sample. You can set the window length, and all events are automatically detected. If you're splitting audio data you can also listen to events by clicking on the window, the audio player is automatically populated with that specific split.
Samples are automatically centered in the window, which might lead to problems on some models (the neural network could learn a shortcut where data in the middle of the window is always associated with a certain label), so you can select "Shift samples" to automatically move the data a little bit around.
Splitting data is - like cropping data - non-destructive. If you're not happy with a split just click Crop sample and you can move the selection around easily.
The labelling queue will only appear on your data acquisition page if you are dealing with an object detection tasks. The labelling queue shows a list of images that have been staged for annotation for your project.
If you are not dealing with an object detection task, you can simply disable the labelling queue bar by going to Dashboard > Project info > Labeling method and clicking the dropdown and selecting "one label per data item" as shown in the image below.