You can upload an already existing dataset to your project directly through the Edge impulse Studio. The data should be in the Data Acquisition format (CBOR, JSON, CSV), or as WAV, JPG or PNG files.
To upload data using the uploader, go to the Data acquisition page and click on the uploader button as shown in the image below:
When uploading your data, you can choose the category you want your data to fall in i.e training set, testing set or automatically split the dataset between training and testing set. You can also choose whether to infer labels from files name or enter a label of which the files should automatically fall in.
All collected data for each project can be viewed on the Data acquisition tab. You can see how your data has been split for train/test set as well as the data distribution for each class in your dataset. You can also send new sensor data to your project either by file upload, WebUSB, Edge Impulse API, or Edge Impulse CLI.
The panel on the right allows you to collect data directly from any fully supported platform:
When using the Edge Impulse for Linux CLI, run edge-impulse-linux --clean
and it will add your platform to the device list of your project. You will then will be able to interact with it from the Record new data panel.
The train/test split is a technique for training and evaluating the performance of a machine learning algorithms. It indicates how your data is split between training and testing samples. For example, an 80/20 split indicates that 80% of the dataset is used for model training purposes while 20% is used for model testing.
This section also shows how your data samples in each class are distributed to prevent imbalanced datasets which might introduce bias during model training.
Manually navigating to some categories of data can be time consuming, especially when dealing with a large dataset. The data acquisition filter enables the user to filter data samples based on some criteria of choice. This can be based on:
Label - class to which a sample represents.
Sample name - unique ID representing a sample.
Signature validity
Enabled and disabled samples
Length of sample - duration of a sample.
The filtered samples can then be manipulated by editing label, deleting, moving from trains set to test set and vise versa a shown in the image above.
The data manipulations above can also be applied at the data sample level just by simply navigating to the individual data sample then clicking ⋮ and selecting the type of action you might want to perform to the specific sample. This might be renaming , editing its label, disabling, cropping, splitting, downloading, and even deleting the sample when desired.
To crop a data sample, go to the sample you want to crop and click ⋮, then select Crop sample. You can specific a length, or drag the handles to resize the window, then move the window around to make your selection.
Made a wrong crop? No problem, just click Crop sample again and you can move your selection around. To undo the crop, just set the sample length to a high number, and the whole sample will be selected again.
Besides cropping you can also split data automatically. Here you can perform one motion repeatedly, or say a keyword over and over again, and the events are detected and can be stored as individual samples. This makes it easy to very quickly build a high-quality dataset of discrete events. To do so head to Data acquisition, record some new data, click, and select Split sample. You can set the window length, and all events are automatically detected. If you're splitting audio data you can also listen to events by clicking on the window, the audio player is automatically populated with that specific split.
Samples are automatically centered in the window, which might lead to problems on some models (the neural network could learn a shortcut where data in the middle of the window is always associated with a certain label), so you can select "Shift samples" to automatically move the data a little bit around.
Splitting data is - like cropping data - non-destructive. If you're not happy with a split just click Crop sample and you can move the selection around easily.
The labelling queue will only appear on your data acquisition page if you are dealing with an object detection tasks. The labelling queue shows a list of images that have been staged for annotation for your project.
If you are not dealing with an object detection task, you can simply disable the labelling queue bar by going to Dashboard > Project info > Labeling method and clicking the dropdown and selecting "one label per data item" as shown in the image below.
The CSV Wizard allows users with larger or more complex datasets to easily upload their data without having to worry about converting it to the .
To access the CSV Wizard, navigate to the Data Acquisition tab of your Edge Impulse project and click on the CSV Wizard button:
We can take a look at some sample data from a Heart Rate Monitor (Polar H10). We can see there is a lot of extra information we don’t need:
Choose a CSV file to upload and select "Upload File". The file will be automatically analyzed and the results will be displayed in the next step. Here I have selected an export from a HR monitor. You can try it out yourself by downloading this file:
When processing your data, we will check for the following:
Does this data contain a label?
Is this data time series data?
Is this data raw sensor data or processed features?
Is this data separated by a standard delimiter?
Is this data separated by a non-standard delimiter?
If there are settings that need to be adjusted, (for the start of your data you can select skip first x lines or no header, and adjust the delimiter) you can do so before selecting looks good, next"**.
Here you can select the timestamp column, or row and the frequency of the timestamps. If you do not have a timestamp column, you can select No timestamp column and add a timestamp later. If you do have a timestamp column you can select: the timestamp format, e.g. full timestamp, and the frequency of the timestamps, overriding is also possible via Override timestamp difference. For example Selecting 20000 will give you the detected frequency of: 0.05 Hz.
Here you can select the label column, or row. If you do not have a label column, you can select No (no worries, you can provide this when you upload data) and add a label later. If you do have a label column you can select: Yes it's "Value" The CSV Wizard allows users with larger or more complex datasets to easily upload their data without having to worry about converting it to CBOR format. You can also select the columns that contain your values.
How long do you want your samples to be?
In this section, you can set a length limit to your sample size. For example, if your CSV contains 30 seconds of data, when setting a limit of 3000ms, it will create 10 distinct data samples of 3 seconds.
Congratulations! 🚀 You have successfully created a CSV transform with the CSV Wizard. You can now save this transform and use it to process your data.
Any CSV files that you upload into your project - whether it's through the uploader, the CLI, the API or through data sources - will now be processed according to the rules you set up with the CSV Wizard!
In object detection ML projects, labeling is the process of defining regions of interest in the frame.
Manually labeling images can become tedious and time-consuming, especially when dealing with huge datasets. This is why Edge Impulse studio provides an AI-assisted labeling tool to help you in your labeling workflows.
To use the labeling queue, you will need to set your Edge Impulse project as an "object detection" project. The labeling queue will only display the images that have not been labeled.
Currently, it only works to define bounding boxes (ingestion format used to train both MobileNetv2 SSD and FOMO models).
Can't see the labeling queue?
Go to Dashboard, and under 'Project info > Labeling method' select 'Bounding boxes (object detection)'.
There are 3 ways you can use to perform AI-assisted labeling on the Edge Impulse Studio:
Using yolov5
Using your own model
Using object tracking
Already have a labeled dataset?
By utilizing an existing library of pre-trained object detection models from YOLOv5 (trained with the COCO dataset), common objects in your images can quickly be identified and labeled in seconds without needing to write any code!
To label your objects with YOLOv5 classification, click the Label suggestions dropdown and select “Classify using YOLOv5.” If your object is more specific than what is auto-labeled by YOLOv5, e.g. “coffee” instead of the generic “cup” class, you can modify the auto-labels to the left of your image. These modifications will automatically apply to future images in your labeling queue.
Click Save labels to move on to your next raw image, and see your fully labeled dataset ready for training in minutes!
You can also use your own trained model to predict and label your new images. From an existing (trained) Edge Impulse object detection project, upload new unlabeled images from the Data Acquisition tab.
Currently, this only works with models trained with MobileNet SSD transfer learning.
From the “Labeling queue”, click the Label suggestions dropdown and select “Classify using ”:
You can also upload a few samples to a new object detection project, train a model, then upload more samples to the Data Acquisition tab and use the AI-Assisted Labeling feature for the rest of your dataset. Classifying using your own trained model is especially useful for objects that are not in YOLOv5, such as industrial objects, etc.
Click Save labels to move on to your next raw image, and see your fully labeled dataset ready for training in minutes using your own pre-trained model!
If you have objects that are a similar size or common between images, you can also track your objects between frames within the Edge Impulse Labeling Queue, reducing the amount of time needed to re-label and re-draw bounding boxes over your entire dataset.
Draw your bounding boxes and label your images, then, after clicking Save labels, the objects will be tracked from frame to frame:
Now that your object detection project contains a fully labeled dataset, learn how to train and deploy your model to your edge device: check out our tutorial!
We are excited to see what you build with the AI-Assisted Labeling feature in Edge Impulse, please post your project on our forum or tag us on social media, @Edge Impulse!
You can add arbitrary metadata to data items. You can use this for example to track on which site data was collected, where data was imported from, or where the machine that generated the data was placed. Some key use cases for metadata are:
Prevent leaking data between your train and validation set. See: below.
Synchronisation actions in , for example to remove data in a project if the source data was deleted in the cloud.
Get a better understanding of real-world accuracy by seeing how well your model performs when grouped by a metadata key. E.g. whether data on site A performs better than site B.
Metadata is shown on Data acquisition when you click on a data item. From here you can add, edit and remove metadata keys.
It's pretty unpractical to manually add metadata to each data item, so the easiest way is to add metadata when you upload data. You can do this either by:
Setting the x-metadata
header to a JSON string when calling the ingestion service:
When training an ML model we split your data into a train and a validation set. This is done so that during training you can evaluate whether your model works on data that it has seen before (train set) and on data that it has never seen before (validation set) - ideally your model performs similarly well on both data sets: a sign that your model will perform well in the field on completely novel data.
However, this can give a false sense of security if data that is very similar ends up in both your train and validation set ("data leakage"). For example:
You split a video into individual frames. These images don't differ much from frame to frame; and you don't want some frames in the train, and some in the validation set.
You're building a sleep staging algorithm, and look at 30 second windows. From window to window the data for one person will look similar, so you don't want one window in the train, another in the validation set for the same person in the same night.
By default we split your training data randomly in a train and validation set (80/20 split) - which does not prevent data leakage, but if you tag your data items with metadata you can avoid this. To do so:
Tag all your data items with metadata.
Go to any ML block and under Advanced training settings set 'Split train/validation set on metadata key' to a metadata key (f.e. video_file
).
Now every data item with the same metadata value for video_file
will always be grouped together in either the train or the validation set; so no more data leakage.
Through .
Using the .
From the .
The WebUSB and the Edge Impulse daemon work with any fully supported device by flashing the pre-built Edge Impulse firmware to your board. See the list of .
.
.
.
.
.
(Enterprise feature).
(Enterprise feature).
For more information about the labelling queue and how to perform data annotation using AI assisted labelling on Edge Impulse, you can have a look at our documentation .
If you already have a labeled dataset containing bounding boxes, you can use the to import your data.
Providing an file when uploading data (this works both in the CLI and in the Studio).
You can read samples, including their metadata via the API call, and then use the API to update the metadata. For example, this is how you add a metadata field to the first data sample in your project using the :