Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Object detection tasks take an image and output information about the class and number of objects, position, (and, eventually, size) in the image.
Edge Impulse provides, by default, two different model architectures to perform object detection, MobileNetV2 SSD FPN-Lite uses bounding boxes (objects location and size) and FOMO uses centroids (objects location only).
Want to compare the two models?
Detect objects using MobileNet SSD (bounding boxes) Can run on systems starting from Linux CPUs up to powerful GPUs
Object detection using FOMO (centroids) Can run on high-end MCUs, Linux CPUs, and GPUs
This page is part of Adding sight to your sensors and describes how you can use development boards with an integrated camera to import image data into Edge Impulse.
First, make sure your device is connected on the Devices page in the Edge Impulse Studio. Then, head to Data acquisition, and under 'Record new data', set a label and select 'Camera' as a sensor (most devices have multiple resolutions). This shows you a nice preview of the camera. Then click Start sampling.
A few moments later - depending on the speed of the development board and the resolution - you'll now have an image collected!
Do this until you have captured 30 images per class from a variety of angles. Also make sure to vary the things you capture for the unknown class.
This section provides detailed end-to-end tutorials to help you get started with Edge Impulse:
Detect objects using MobileNet SSD (bounding boxes)
Object detection using FOMO (centroids)
In this tutorial, you'll use machine learning to build a system that can recognize objects in your house through a camera - a task known as image classification - connected to a microcontroller. Adding sight to your embedded devices can make them see the difference between poachers and elephants, do quality control on factory lines, or let your RC cars drive themselves. In this tutorial you'll learn how to collect images for a well-balanced dataset, how to apply transfer learning to train a neural network, and deploy the system to an embedded device.
At the end of this tutorial, you'll have a firm understanding of how to classify images using Edge Impulse.
There is also a video version of this tutorial:
You can view the finished project, including all data, signal processing and machine learning blocks here: Tutorial: adding sight to your sensors.
For this tutorial, you'll need a supported device.
If you don't have any of these devices, you can also upload an existing dataset through the Uploader. After this tutorial you can then deploy your trained machine learning model as a C++ library and run it on your device.
In this tutorial we'll build a model that can distinguish between two objects in your house - we've used a plant and a lamp, but feel free to pick two other objects. To make your machine learning model see it's important that you capture a lot of example images of these objects. When training the model these example images are used to let the model distinguish between them. Because there are (hopefully) a lot more objects in your house than just lamps or plants, you also need to capture images that are neither a lamp or a plant to make the model work well.
Capture the following amount of data - make sure you capture a wide variety of angles and zoom levels:
50 images of a lamp.
50 images of a plant.
50 images of neither a plant nor a lamp - make sure to capture a wide variation of random objects in the same room as your lamp or plant.
You can collect data from the following devices:
Collecting image data from the Studio - for all other officially supported boards with camera sensors.
Or you can capture your images using another camera, and then upload them by going to Data acquisition and clicking the 'Upload' icon.
Afterwards you should have a well-balanced dataset listed under Data acquisition in your Edge Impulse project. You can switch between your training and testing data with the two buttons above the 'Data collected' widget.
With the training set in place you can design an impulse. An impulse takes the raw data, adjusts the image size, uses a preprocessing block to manipulate the image, and then uses a learning block to classify new data. Preprocessing blocks always return the same values for the same input (e.g. convert a color image into a grayscale one), while learning blocks learn from past experiences.
For this tutorial we'll use the 'Images' preprocessing block. This block takes in the color image, optionally makes the image grayscale, and then turns the data into a features array. If you want to do more interesting preprocessing steps - like finding faces in a photo before feeding the image into the network -, see the Building custom processing blocks tutorial. Then we'll use a 'Transfer Learning' learning block, which takes all the images in and learns to distinguish between the three ('plant', 'lamp', 'unknown') classes.
In the studio go to Create impulse, set the image width and image height to 96
, and add the 'Images' and 'Transfer Learning (Images)' blocks. Then click Save impulse.
To configure your processing block, click Images in the menu on the left. This will show you the raw data on top of the screen (you can select other files via the drop down menu), and the results of the processing step on the right. You can use the options to switch between 'RGB' and 'Grayscale' mode, but for now leave the color depth on 'RGB' and click Save parameters.
This will send you to the 'Feature generation' screen. In here you'll:
Resize all the data.
Apply the processing block on all this data.
Create a 3D visualization of your complete dataset.
Click Generate features to start the process.
Afterwards the 'Feature explorer' will load. This is a plot of all the data in your dataset. Because images have a lot of dimensions (here: 96x96x3=27,648 features) we run a process called 'dimensionality reduction' on the dataset before visualizing this. Here the 27,648 features are compressed down to just 3, and then clustered based on similarity. Even though we have little data you can already see some clusters forming (lamp images are all on the right), and can click on the dots to see which image belongs to which dot.
With all data processed it's time to start training a neural network. Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. The network that we're training here will take the image data as an input, and try to map this to one of the three classes.
It's very hard to build a good working computer vision model from scratch, as you need a wide variety of input data to make the model generalize well, and training such models can take days on a GPU. To make this easier and faster we are using transfer learning. This lets you piggyback on a well-trained model, only retraining the upper layers of a neural network, leading to much more reliable models that train in a fraction of the time and work with substantially smaller datasets.
To configure the transfer learning model, click Transfer learning in the menu on the left. Here you can select the base model (the one selected by default will work, but you can change this based on your size requirements), optionally enable data augmentation (images are randomly manipulated to make the model perform better in the real world), and the rate at which the network learns.
Set:
Number of training cycles to 20
.
Learning rate to 0.0005
.
Data augmentation: enabled.
Minimum confidence rating: 0.7.
Important: If you're using a development board with less memory, like the Arduino Nano 33 BLE Sense click Choose a different model and select MobileNetV1 96x96 0.25. This is a smaller transfer learning model.
And click Start training. After the model is done you'll see accuracy numbers, a confusion matrix and some predicted on-device performance on the bottom. You have now trained your model!
With the model trained let's try it out on some test data. When collecting the data we split the data up between a training and a testing dataset. The model was trained only on the training data, and thus we can use the data in the testing dataset to validate how well the model will work in the real world. This will help us ensure the model has not learned to overfit the training data, which is a common occurrence.
To validate your model, go to Model testing, select the checkbox next to 'Sample name' and click Classify selected. Here we hit 89% accuracy, which is great for a model with so little data.
To see a classification in detail, click the three dots next to an item, and select Show classification. This brings you to the Live classification screen with much more details on the file (if you collected data with your mobile phone you can also capture new testing data directly from here). This screen can help you determine why items were misclassified.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption. Edge Impulse can package up the complete impulse - including the preprocessing steps, neural network weights, and classification code - in a single C++ library that you can include in your embedded software.
To run your impulse on either the OpenMV camera or your phone, follow these steps:
OpenMV Cam H7 Plus: Running your impulse on your OpenMV camera
Mobile phone: just click Switch to classification mode at the bottom of your phone screen.
For other boards: click on Deployment in the menu. Then under 'Build firmware' select your development board, and click Build. This will export the impulse, and build a binary that will run on your development board in a single step. After building is completed you'll get prompted to download a binary. Save this on your computer.
When you click the Build button, you'll see a pop-up with text and video instructions on how to deploy the binary to your particular device. Follow these instructions. Once you are done, we are ready to test your impulse out.
We can connect to the board's newly flashed firmware over serial. Open a terminal and run:
To also see a preview of the camera, run:
To run continuous (without a pause every 2 seconds), but without the preview, run:
Congratulations! You've added sight to your sensors. Now that you've trained your model you can integrate your impulse in the firmware of your own embedded device, see Running your impulse locally. There are examples for Mbed OS, Arduino, STM32CubeIDE, and any other target that supports a C++ compiler. Note that the model we trained in this tutorial is relatively big, but you can choose a smaller transfer learning model.
Or if you're interested in more, see our tutorials on Continuous motion recognition or Recognize sounds from audio. If you have a great idea for a different project, that's fine too. Edge Impulse lets you capture data from any sensor, build custom processing blocks to extract features, and you have full flexibility in your Machine Learning pipeline with the learning blocks.
We can't wait to see what you'll build! 🚀
In this tutorial, you'll use machine learning to build a system that can recognize when a particular sound is happening—a task known as audio classification. The system you create will be able to recognize the sound of water running from a faucet, even in the presence of other background noise.
You'll learn how to collect audio data from microphones, use signal processing to extract the most important information, and train a deep neural network that can tell you whether the sound of running water can be heard in a given clip of audio. Finally, you'll deploy the system to an embedded device and evaluate how well it works.
At the end of this tutorial, you'll have a firm understanding of how to classify audio using Edge Impulse.
There is also a video version of this tutorial:
You can view the finished project, including all data, signal processing and machine learning blocks here: Tutorial: recognize sounds from audio.
Detecting human speech?
Do you want a device that listens to your voice? We have a specific tutorial for that! See Responding to your voice.
For this tutorial, you'll need a supported device.
If you don't see your supported development board listed here, be sure to check the Hardware specific tutorials page for the appropriate tutorial.
If your device is connected under Devices in the studio you can proceed:
Device compatibility
Edge Impulse can ingest data from any device - including embedded devices that you already have in production. See the documentation for the Ingestion service for more information.
To build this project, you'll need to collect some audio data that will be used to train the machine learning model. Since the goal is to detect the sound of a running faucet, you'll need to collect some examples of that. You'll also need some examples of typical background noise that doesn't contain the sound of a faucet, so the model can learn to discriminate between the two. These two types of examples represent the two classes we'll be training our model to detect: background noise, or running faucet.
You can use your device to collect some data. In the studio, go to the Data acquisition tab. This is the place where all your raw data is stored, and - if your device is connected to the remote management API - where you can start sampling new data.
Let's start by recording an example of background noise that doesn't contain the sound of a running faucet. Under Record new data, select your device, set the label to noise
, the sample length to 1000
, and the sensor to Built-in microphone
. This indicates that you want to record 1 second of audio, and label the recorded data as noise
. You can later edit these labels if needed.
After you click Start sampling, the device will capture a second of audio and transmit it to Edge Impulse. The LED will light while recording is in progress, then light again during transmission.
When the data has been uploaded, you will see a new line appear under 'Collected data'. You will also see the waveform of the audio in the 'RAW DATA' box. You can use the controls underneath to listen to the audio that was captured.
Since you now know how to capture audio with Edge Impulse, it's time to start building a dataset. For a simple audio classification model like this one, we should aim to capture around 10 minutes of data. We have two classes, and it's ideal if our data is balanced equally between each of them. This means we should aim to capture the following data:
5 minutes of background noise, with the label "noise"
5 minutes of running faucet noise, with the label "faucet"
Real world data
In the real world, there are usually additional sounds present alongside the sounds we care about. For example, a running faucet is often accompanied by the sound of dishes being washed, teeth being brushed, or a conversation in the kitchen. Background noise might also include the sounds of television, kids playing, or cars driving past outside.
It's important that your training data contains these types of real world sounds. If your model is not exposed to them during training, it will not learn to take them into account, and it will not perform well during real-world usage.
For this tutorial, you should try to capture the following:
Background noise
2 minutes of background noise without much additional activity
1 minute of background noise with a TV or music playing
1 minute of background noise featuring occasional talking or conversation
1 minutes of background noise with the sounds of housework
Running faucet noise
1 minute of a faucet running
1 minute of a different faucet running
1 minute of a faucet running with a TV or music playing
1 minute of a faucet running with occasional talking or conversation
1 minute of a faucet running with the sounds of housework
It's okay if you can't get all of these, as long as you still obtain 5 minutes of data for each class. However, your model will perform better in the real world if it was trained on a representative dataset.
Dataset diversity
There's no guarantee your model will perform well in the presence of sounds that were not included in its training set, so it's important to make your dataset as diverse and representative of real-world conditions as possible.
Data capture and transmission
The amount of audio that can be captured in one go varies depending on a device's memory. The ST B-L475E-IOT01A developer board has enough memory to capture 60 seconds of audio at a time, and the Arduino Nano 33 BLE Sense has enough memory for 16 seconds. To capture 60 seconds of audio, set the sample length to 60000
. Because the board transmits data quite slowly, it will take around 7 minutes before a 60 second sample appears in Edge Impulse.
Once you've captured around 10 minutes of data, it's time to start designing an Impulse.
Prebuilt dataset
Alternatively, you can load an example test set that has about ten minutes of data in these classes (but how much fun is that?). See the Running faucet dataset for more information.
With the training set in place you can design an impulse. An impulse takes the raw data, slices it up in smaller windows, uses signal processing blocks to extract features, and then uses a learning block to classify new data. Signal processing blocks always return the same values for the same input and are used to make raw data easier to process, while learning blocks learn from past experiences.
For this tutorial we'll use the "MFE" signal processing block. MFE stands for Mel Frequency Energy. This sounds scary, but it's basically just a way of turning raw audio—which contains a large amount of redundant information—into simplified form.
Spectrogram block
Edge Impulse supports three different blocks for audio classification: MFCC, MFE and spectrogram blocks. If your accuracy is not great using the MFE block you can switch to the spectrogram block, which is not tuned to frequencies for the human ear.
We'll then pass this simplified audio data into a Neural Network block, which will learn to distinguish between the two classes of audio (faucet and noise).
In the studio, go to the Create impulse tab. You'll see a Raw data block, like this one.
As mentioned above, Edge Impulse slices up the raw samples into windows that are fed into the machine learning model during training. The Window size field controls how long, in milliseconds, each window of data should be. A one second audio sample will be enough to determine whether a faucet is running or not, so you should make sure Window size is set to 1000 ms. You can either drag the slider or type a new value directly.
Each raw sample is sliced into multiple windows, and the Window increase field controls the offset of each subsequent window from the first. For example, a Window increase value of 1000 ms would result in each window starting 1 second after the start of the previous one.
By setting a Window increase that is smaller than the Window size, we can create windows that overlap. This is actually a great idea. Although they may contain similar data, each overlapping window is still a unique example of audio that represents the sample's label. By using overlapping windows, we can make the most of our training data. For example, with a Window size of 1000 ms and a Window increase of 100 ms, we can extract 10 unique windows from only 2 seconds of data.
Make sure the Window increase field is set to 300 ms. The Raw data block should match the screenshot above.
Next, click Add a processing block and choose the 'MFE' block. Once you're done with that, click Add a learning block and select 'Classification (Keras)'. Finally, click Save impulse. Your impulse should now look like this:
Now that we've assembled the building blocks of our Impulse, we can configure each individual part. Click on the MFE tab in the left hand navigation menu. You'll see a page that looks like this:
This page allows you to configure the MFE block, and lets you preview how the data will be transformed. The right of the page shows a visualization of the MFE's output for a piece of audio, which is known as a spectrogram.
The MFE block transforms a window of audio into a table of data where each row represents a range of frequencies and each column represents a span of time. The value contained within each cell reflects the amplitude of its associated range of frequencies during that span of time. The spectrogram shows each cell as a colored block, the intensity which varies depends on the amplitude.
The patterns visible in a spectrogram contain information about what type of sound it represents. For example, the spectrogram in this image shows a pattern typical of background noise:
You can tell that it is slightly different from the following spectrogram, which shows a pattern typical of a running faucet:
These differences are not necessarily easy for a person to describe, but fortunately they are enough for a neural network to learn to identify.
It's interesting to explore your data and look at the types of spectrograms it results in. You can use the dropdown box near the top right of the page to choose between different audio samples to visualize, and drag the white window on the audio waveform to select different windows of data:
There are a lot of different ways to configure the MFCC block, as shown in the Parameters box:
Handily, Edge Impulse provides sensible defaults that will work well for many use cases, so we can leave these values unchanged. You can play around with the noise floor to quickly see the effect it has on the spectrogram.
The spectrograms generated by the MFE block will be passed into a neural network architecture that is particularly good at learning to recognize patterns in this type of tabular data. Before training our neural network, we'll need to generate MFE blocks for all of our windows of audio. To do this, click the Generate features button at the top of the page, then click the green Generate features button. If you have a full 10 minutes of data, the process will take a while to complete:
Once this process is complete the feature explorer shows a visualization of your dataset. Here dimensionality reduction is used to map your features onto a 3D space, and you can use the feature explorer to see if the different classes separate well, or find mislabeled data (if it shows in a different cluster). You can find more information in visualizing complex datasets.
Next, we'll configure the neural network and begin training.
With all data processed it's time to start training a neural network. Neural networks are algorithms, modeled loosely after the human brain, that can learn to recognize patterns that appear in their training data. The network that we're training here will take the MFE as an input, and try to map this to one of two classes—noise, or faucet.
Click on NN Classifier in the left hand menu. You'll see the following page:
A neural network is composed of layers of virtual "neurons", which you can see represented on the left hand side of the NN Classifier page. An input—in our case, an MFE spectrogram—is fed into the first layer of neurons, which filters and transforms it based on each neuron's unique internal state. The first layer's output is then fed into the second layer, and so on, gradually transforming the original input into something radically different. In this case, the spectrogram input is transformed over four intermediate layers into just two numbers: the probability that the input represents noise, and the probability that the input represents a running faucet.
During training, the internal state of the neurons is gradually tweaked and refined so that the network transforms its input in just the right ways to produce the correct output. This is done by feeding in a sample of training data, checking how far the network's output is from the correct answer, and adjusting the neurons' internal state to make it more likely that a correct answer is produced next time. When done thousands of times, this results in a trained network.
A particular arrangement of layers is referred to as an architecture, and different architectures are useful for different tasks. The default neural network architecture provided by Edge Impulse will work well for our current project, but you can also define your own architectures. You can even import custom neural network code from tools used by data scientists, such as TensorFlow and Keras.
The default settings should work, and to begin training, click Start training. You'll see a lot of text flying past in the Training output panel, which you can ignore for now. Training will take a few minutes. When it's complete, you'll see the Model panel appear at the right side of the page:
Congratulations, you've trained a neural network with Edge Impulse! But what do all these numbers mean?
At the start of training, 20% of the training data is set aside for validation. This means that instead of being used to train the model, it is used to evaluate how the model is performing. The Last training performance panel displays the results of this validation, providing some vital information about your model and how well it is working. Bear in mind that your exact numbers may differ from the ones in this tutorial.
On the left hand side of the panel, Accuracy refers to the percentage of windows of audio that were correctly classified. The higher number the better, although an accuracy approaching 100% is unlikely, and is often a sign that your model has overfit the training data. You will find out whether this is true in the next stage, during model testing. For many applications, an accuracy above 80% can be considered very good.
The Confusion matrix is a table showing the balance of correctly versus incorrectly classified windows. To understand it, compare the values in each row. For example, in the above screenshot, all of the faucet audio windows were classified as faucet, but a few noise windows were misclassified. This appears to be a great result though.
The On-device performance region shows statistics about how the model is likely to run on-device. Inferencing time is an estimate of how long the model will take to analyze one second of data on a typical microcontroller (here: an Arm Cortex-M4F running at 80MHz). Peak memory usage gives an idea of how much RAM will be required to run the model on-device.
The performance numbers in the previous step show that our model is working well on its training data, but it's extremely important that we test the model on new, unseen data before deploying it in the real world. This will help us ensure the model has not learned to overfit the training data, which is a common occurrence.
Edge Impulse provides some helpful tools for testing our model, including a way to capture live data from your device and immediately attempt to classify it. To try it out, click on Live classification in the left hand menu. Your device should show up in the 'Classify new data' panel. Capture 5 seconds of background noise by clicking Start sampling:
The sample will be captured, uploaded, and classified. Once this has happened, you'll see a breakdown of the results:
Once the sample is uploaded, it is split into windows–in this case, a total of 41. These windows are then classified. As you can see, our model classified all 41 windows of the captured audio as noise. This is a great result! Our model has correctly identified that the audio was background noise, even though this is new data that was not part of its training set.
Of course, it's possible some of the windows may be classified incorrectly. Since our model was 99% accurate based on its validation data, you can expect that at least 1% of windows will be classified wrongly—and likely much more than this, since our validation data doesn't represent every possible type of background or faucet noise. If your model didn't perform perfectly, don't worry. We'll get to troubleshooting later.
Misclassifications and uncertain results
It's inevitable that even a well-trained machine learning model will sometimes misclassify its inputs. When you integrate a model into your application, you should take into account that it will not always give you the correct answer.
For example, if you are classifying audio, you might want to classify several windows of data and average the results. This will give you better overall accuracy than assuming that every individual result is correct.
Using the Live classification tab, you can easily try out your model and get an idea of how it performs. But to be really sure that it is working well, we need to do some more rigorous testing. That's where the Model testing tab comes in. If you open it up, you'll see the sample we just captured listed in the Test data panel:
In addition to its training data, every Edge Impulse project also has a test dataset. Samples captured in Live classification are automatically saved to the test dataset, and the Model testing tab lists all of the test data.
To use the sample we've just captured for testing, we should correctly set its expected outcome. Click the ⋮
icon and select Edit expected outcome, then enter noise
. Now, select the sample using the checkbox to the left of the table and click Classify selected:
You'll see that the model's accuracy has been rated based on the test data. Right now, this doesn't give us much more information that just classifying the same sample in the Live classification tab. But if you build up a big, comprehensive set of test samples, you can use the Model testing tab to measure how your model is performing on real data.
Ideally, you'll want to collect a test set that contains a minimum of 25% the amount of data of your training set. So, if you've collected 10 minutes of training data, you should collect at least 2.5 minutes of test data. You should make sure this test data represents a wide range of possible conditions, so that it evaluates how the model performs with many different types of inputs. For example, collecting test audio for several different faucets is a good idea.
You can use the Data acquisition tab to manage your test data. Open the tab, and then click Test data at the top. Then, use the Record new data panel to capture a few minutes of test data, including audio for both background noise and faucet. Make sure the samples are labelled correctly. Once you're done, head back to the Model testing tab, select all the samples, and click Classify selected:
The screenshot shows classification results from a large number of test samples (there are more on the page than would fit in the screenshot). The panel shows that our model is performing at 85% accuracy, which is 5% less than how it performed on validation data. It's normal for a model to perform less well on entirely fresh data, so this is a successful result. Our model is working well!
For each test sample, the panel shows a breakdown of its individual performance. For example, one of the samples was classified with only 62% accuracy. Samples that contain a lot of misclassifications are valuable, since they have examples of types of audio that our model does not currently fit. It's often worth adding these to your training data, which you can do by clicking the ⋮
icon and selecting Move to training set. If you do this, you should add some new test data to make up for the loss!
Testing your model helps confirm that it works in real life, and it's something you should do after every change. However, if you often make tweaks to your model to try to improve its performance on the test dataset, your model may gradually start to overfit to the test dataset, and it will lose its value as a metric. To avoid this, continually add fresh data to your test dataset.
Data hygiene
It's extremely important that data is never duplicated between your training and test datasets. Your model will naturally perform well on the data that it was trained on, so if there are duplicate samples then your test results will indicate better performance than your model will achieve in the real world.
If the network performed great, fantastic! But what if it performed poorly? There could be a variety of reasons, but the most common ones are:
The data does not look like other data the network has seen before. This is common when someone uses the device in a way that you didn't add to the test set. You can add the current file to the test set by adding the correct label in the 'Expected outcome' field, clicking ⋮
, then selecting Move to training set.
The model has not been trained enough. Increase number of epochs to 200
and see if performance increases (the classified file is stored, and you can load it through 'Classify existing validation sample').
The model is overfitting and thus performs poorly on new data. Try reducing the number of epochs, reducing the learning rate, or adding more data.
The neural network architecture is not a great fit for your data. Play with the number of layers and neurons and see if performance improves.
As you see, there is still a lot of trial and error when building neural networks. Edge Impulse is continually adding features that will make it easier to train an effective model.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption. Edge Impulse can package up the complete impulse - including the MFE algorithm, neural network weights, and classification code - in a single C++ library that you can include in your embedded software.
Mobile phone
Your mobile phone can build and download the compiled impulse directly from the mobile client. See 'Deploying back to device' on the Using your mobile phone page.
To export your model, click on Deployment in the menu. Then under 'Build firmware' select your development board, and click Build. This will export the impulse, and build a binary that will run on your development board in a single step. After building is completed you'll get prompted to download a binary. Save this on your computer.
When you click the Build button, you'll see a pop-up with text and video instructions on how to deploy the binary to your particular device. Follow these instructions. Once you are done, we are ready to test your impulse out.
We can connect to the board's newly flashed firmware over serial. Open a terminal and run:
Serial daemon
If the device is not connected over WiFi, but instead connected via the Edge Impulse serial daemon, you'll need stop the daemon. Only one application can connect to the development board at a time.
This will capture audio from the microphone, run the MFE code, and then classify the spectrogram:
Great work! You've captured data, trained a model, and deployed it to an embedded device. It's time to celebrate—by pouring yourself a nice glass of water, and checking whether the sound is correctly classified by you model.
Congratulations! you've used Edge Impulse to train a neural network model capable of recognizing a particular sound. There are endless applications for this type of model, from monitoring industrial machinery to recognizing voice commands. Now that you've trained your model you can integrate your impulse in the firmware of your own embedded device, see Running your impulse locally. There are examples for Mbed OS, Arduino, STM32CubeIDE, and any other target that supports a C++ compiler.
Or if you're interested in more, see our tutorials on Continuous motion recognition or Adding sight to your sensors. If you have a great idea for a different project, that's fine too. Edge Impulse lets you capture data from any sensor, build custom processing blocks to extract features, and you have full flexibility in your Machine Learning pipeline with the learning blocks.
We can't wait to see what you'll build! 🚀
In this tutorial, you'll use machine learning to build a system that can recognize audible events, particularly your voice through audio classification. The system you create will work similarly to "Hey Siri" or "OK, Google" and is able to recognize keywords or other audible events, even in the presence of other background noise or background chatter.
You'll learn how to collect audio data from microphones, use signal processing to extract the most important information, and train a deep neural network that can tell you whether your keyword was heard in a given clip of audio. Finally, you'll deploy the system to an embedded device and evaluate how well it works.
At the end of this tutorial, you'll have a firm understanding of how to classify audio using Edge Impulse.
There is also a video version of this tutorial:
You can view the finished project, including all data, signal processing and machine learning blocks here: Tutorial: responding to your voice.
Detect non-voice audio?
We have a tutorial for that too! See Recognize sounds from audio.
For this tutorial, you'll need a supported device.
If your device is connected under Devices in the studio you can proceed:
Device compatibility
Edge Impulse can ingest data from any device - including embedded devices that you already have in production. See the documentation for the Ingestion service for more information.
In this tutorial we want to build a system that recognizes keywords, so your first job is to think of a great one. It can be your name, an action, or even a growl - it's your party. Do keep in mind that some keywords are harder to distinguish from others, and especially keywords with only one syllable (like 'One') might lead to false-positives (e.g. when you say 'Gone'). This is the reason that Apple, Google and Amazon all use at least three-syllable keywords ('Hey Siri', 'OK, Google', 'Alexa'). A good one would be "Hello world".
To collect your first data, go to Data acquisition, set your keyword as the label, set your sample length to 10s., your sensor to 'microphone' and your frequency to 16KHz. Then click Start sampling and start saying your keyword over and over again (with some pause in between).
Note: Data collection from a development board might be slow, you can use your Mobile phone as a sensor to make this much faster.
Afterwards you have a file like this, clearly showing your keywords, separated by some noise.
This data is not suitable for Machine Learning yet though. You will need to cut out the parts where you say your keyword. This is important because you only want the actual keyword to be labeled as such, and not accidentally label noise, or incomplete sentences (e.g. only "Hello"). Fortunately the Edge Impulse Studio can do this for you. Click ⋮
next to your sample, and select Split sample.
If you have a short keyword, enable Shift samples to randomly shift the sample around in the window, and then click Split. You now have individual 1s. long samples in your dataset. Perfect!
Now that you know how to collect data we can consider other data we need to collect. In addition to your keyword we'll also need audio that is not your keyword. Like background noise, the TV playing ('noise' class), and humans saying other words ('unknown' class). This is required because a machine learning model has no idea about right and wrong (unless those are your keywords), but only learns from the data you feed into it. The more varied your data is, the better your model will work.
For each of these three classes ('your keyword', 'noise', and 'unknown') you want to capture an even amount of data (balanced datasets work better) - and for a decent keyword spotting model you'll want at least 10 minutes in each class (but, the more the better).
Thus, collect 10 minutes of samples for your keyword - do this in the same manner as above. The fastest way is probably through your mobile phone, collecting 1 minute clips, then automatically splitting this data. Make sure to capture wide variations of the keyword: leverage your family and your colleagues to help you collect the data, make sure you cover high and low pitches, and slow and fast speakers.
For the noise and unknown datasets you can either collect this yourself, or make your life a bit easier by using dataset of both 'noise' (all kinds of background noise) and 'unknown' (random words) data that we built for you here: Pre-built datasets > Keyword spotting.
To import this data, go to Data acquisition, click the Upload icon, and select a number of 'noise' or 'unknown' samples (there's 25 minutes of each class, but you can select less files if you want), and clicking Begin upload. The data is automatically labeled and added to your project.
If you've collected all your training data through the 'Record new data' widget you'll have all your keywords in the 'Training' dataset. This is not great, because you want to keep 20% of your data separate to validate the machine learning model. To mitigate this you can go to Dashboard and select Perform train/test split. This will automatically split your data between a training class (80%) and a testing class (20%). Afterwards you should see something like this:
With the data set in place you can design an impulse. An impulse takes the raw data, slices it up in smaller windows, uses signal processing blocks to extract features, and then uses a learning block to classify new data. Signal processing blocks always return the same values for the same input and are used to make raw data easier to process, while learning blocks learn from past experiences.
For this tutorial we'll use the "MFCC" signal processing block. MFCC stands for Mel Frequency Cepstral Coefficients. This sounds scary, but it's basically just a way of turning raw audio—which contains a large amount of redundant information—into simplified form. Edge Impulse has many other processing blocks for audio, including "MFE" and the "Spectrogram" blocks for non-voice audio, but the "MFCC" block is great for dealing with human speech.
We'll then pass this simplified audio data into a Neural Network block, which will learn to distinguish between the three classes of audio.
In the Studio, go to the Create impulse tab, add a Time series data, an Audio (MFCC) and a Classification (Keras) block. Leave the window size to 1 second (as that's the length of our audio samples in the dataset) and click Save Impulse.
Now that we've assembled the building blocks of our Impulse, we can configure each individual part. Click on the MFCC tab in the left hand navigation menu. You'll see a page that looks like this:
This page allows you to configure the MFCC block, and lets you preview how the data will be transformed. The right of the page shows a visualization of the MFCC's output for a piece of audio, which is known as a spectrogram. An MFCC spectrogram is a specially tuned spectrogram which highlights frequencies which are common in human speech (Edge Impulse also has normal spectrograms if that's more your thing).
In the spectrogram the vertical axis represents the frequencies (the number of frequency bands is controlled by 'Number of coefficients' parameter, try it out!), and the horizontal axis represents time (controlled by 'frame stride' and 'frame length'). The patterns visible in a spectrogram contain information about what type of sound it represents. For example, the spectrogram in this image shows "Hello world":
And the spectrogram in this image shows "On":
These differences are not necessarily easy for a person to describe, but fortunately they are enough for a neural network to learn to identify.
It's interesting to explore your data and look at the types of spectrograms it results in. You can use the dropdown box near the top right of the page to choose between different audio samples to visualize, or play with the parameters to see how the spectrogram changes.
In addition, you can see the performance of the MFCC block on your microcontroller below the spectrogram. This is the complete time that it takes on a low-power microcontroller (Cortex-M4F @ 80MHz) to analyze 1 second of data.
You might think based on this number that we can only classify 2 or 3 windows per second, but we continuously build up the spectrogram (as it has a time component), which takes less time, and we can thus continuously listen for events 5-6x a second, even on an 40MHz processor. This is already implemented on all fully supported development boards, and easy to implement on your own device.
The spectrograms generated by the MFCC block will be passed into a neural network architecture that is particularly good at learning to recognize patterns in this type of tabular data. Before training our neural network, we'll need to generate MFCC blocks for all of our windows of audio. To do this, click the Generate features button at the top of the page, then click the green Generate features button. This will take a minute or so to complete.
Afterwards you're presented with one of the most useful features in Edge Impulse: the feature explorer. This is a 3D representation showing your complete dataset, with each data-item color-coded to its respective label. You can zoom in to every item, find anomalies (an item that's in a wrong cluster), and click on items to listen to the sample. This is a great way to check whether your dataset contains wrong items, and to validate whether your dataset is suitable for ML (it should separate nicely).
With all data processed it's time to start training a neural network. Neural networks are algorithms, modeled loosely after the human brain, that can learn to recognize patterns that appear in their training data. The network that we're training here will take the MFCC as an input, and try to map this to one of three classes—your keyword, noise or unknown.
Click on NN Classifier in the left hand menu. You'll see the following page:
A neural network is composed of layers of virtual "neurons", which you can see represented on the left hand side of the NN Classifier page. An input—in our case, an MFCC spectrogram—is fed into the first layer of neurons, which filters and transforms it based on each neuron's unique internal state. The first layer's output is then fed into the second layer, and so on, gradually transforming the original input into something radically different. In this case, the spectrogram input is transformed over four intermediate layers into just two numbers: the probability that the input represents your keyword, and the probability that the input represents 'noise' or 'unknown'.
During training, the internal state of the neurons is gradually tweaked and refined so that the network transforms its input in just the right ways to produce the correct output. This is done by feeding in a sample of training data, checking how far the network's output is from the correct answer, and adjusting the neurons' internal state to make it more likely that a correct answer is produced next time. When done thousands of times, this results in a trained network.
A particular arrangement of layers is referred to as an architecture, and different architectures are useful for different tasks. The default neural network architecture provided by Edge Impulse will work well for our current project, but you can also define your own architectures. You can even import custom neural network code from tools used by data scientists, such as TensorFlow and Keras (click the three dots at the top of the page).
Before you begin training, you should change some values in the configuration. Change the Minimum confidence rating to 0.6. This means that when the neural network makes a prediction (for example, that there is 0.8 probability that some audio contains "hello world") Edge Impulse will disregard it unless it is above the threshold of 0.6.
Next, enable 'Data augmentation'. When enabled your data is randomly mutated during training. For example, by adding noise, masking time or frequency bands, or warping your time axis. This is a very quick way to make your dataset work better in real life (with unpredictable sounds coming in), and prevents your neural network from overfitting (as the data samples are changed every training cycle).
With everything in place, click Start training. You'll see a lot of text flying past in the Training output panel, which you can ignore for now. Training will take a few minutes. When it's complete, you'll see the Last training performance panel appear at the bottom of the page:
Congratulations, you've trained a neural network with Edge Impulse! But what do all these numbers mean?
At the start of training, 20% of the training data is set aside for validation. This means that instead of being used to train the model, it is used to evaluate how the model is performing. The Last training performance panel displays the results of this validation, providing some vital information about your model and how well it is working. Bear in mind that your exact numbers may differ from the ones in this tutorial.
On the left hand side of the panel, Accuracy refers to the percentage of windows of audio that were correctly classified. The higher number the better, although an accuracy approaching 100% is unlikely, and is often a sign that your model has overfit the training data. You will find out whether this is true in the next stage, during model testing. For many applications, an accuracy above 85% can be considered very good.
The Confusion matrix is a table showing the balance of correctly versus incorrectly classified windows. To understand it, compare the values in each row. For example, in the above screenshot, 96 of the helloworld audio windows were classified as helloworld, while 10 of them were incorrectly classified as unknown or noise. This appears to be a great result.
The On-device performance region shows statistics about how the model is likely to run on-device. Inferencing time is an estimate of how long the model will take to analyze one second of data on a typical microcontroller (an Arm Cortex-M4F running at 80MHz). Peak RAM usage gives an idea of how much RAM will be required to run the model on-device.
The performance numbers in the previous step show that our model is working well on its training data, but it's extremely important that we test the model on new, unseen data before deploying it in the real world. This will help us ensure the model has not learned to overfit the training data, which is a common occurrence.
Fortunately we've put aside 20% of our data already in the 'Test set' (see Data acquisition). This is data that the model has never seen before, and we can use this to validate whether our model actually works on unseen data. To run your model against the test set, head to Model testing, select all items and click Classify selected.
To drill down into a misclassified sample, click the three dots (⋮
) next to a sample and select Show classification. You're then transported to the classification view, which lets you inspect the sample, and compare the sample to your training data. This way you can inspect whether this was actually a classification failure, or whether your data was incorrectly labeled. From here you can either update the label (when the label was wrong), or move the item to the training set to refine your model.
Misclassifications and uncertain results
It's inevitable that even a well-trained machine learning model will sometimes misclassify its inputs. When you integrate a model into your application, you should take into account that it will not always give you the correct answer.
For example, if you are classifying audio, you might want to classify several windows of data and average the results. This will give you better overall accuracy than assuming that every individual result is correct.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption. Edge Impulse can package up the complete impulse - including the MFCC algorithm, neural network weights, and classification code - in a single C++ library that you can include in your embedded software.
Mobile phone
Your mobile phone can build and download the compiled impulse directly from the mobile client. See 'Deploying back to device' on the Using your mobile phone page.
To export your model, click on Deployment in the menu. Then under 'Build firmware' select your development board, and click Build. This will export the impulse, and build a binary that will run on your development board in a single step. After building is completed you'll get prompted to download a binary. Save this on your computer.
When you click the Build button, you'll see a pop-up with text and video instructions on how to deploy the binary to your particular device. Follow these instructions. Once you are done, we are ready to test your impulse out.
We can connect to the board's newly flashed firmware over serial. Open a terminal and run:
Serial daemon
If the device is not connected over WiFi, but instead connected via the Edge Impulse serial daemon, you'll need stop the daemon. Only one application can connect to the development board at a time.
This will capture audio from the microphone, run the MFCC code, and then classify the spectrogram:
Great work! You've captured data, trained a model, and deployed it to an embedded device. You can now control LEDs, activate actuators, or send a message to the cloud whenever you say a keyword!
Is your model working properly in the Studio, but does not recognize your keyword when running in continuous mode on your device? Then this is probably due to dataset imbalance (a lot more unknown / noise data compared to your keyword) in combination with our moving average code to reduce false positives.
When running in continuous mode we run a moving average over the predictions to prevent false positives. E.g. if we do 3 classifications per second you’ll see your keyword potentially classified three times (once at the start of the audio file, once in the middle, once at the end). However, if your dataset is unbalanced (there’s a lot more noise / unknown than in your dataset) the ML model typically manages to only find your keyword in the 'center' window, and thus we filter it out as a false positive.
You can fix this by either:
Adding more data :-)
Or, by disabling the moving average filter by going into ei_run_classifier.h (in the edge-impulse-sdk directory) and removing:
Note that this might increase the number of false positives the model detects.
Congratulations! you've used Edge Impulse to train a neural network model capable of recognizing audible events. There are endless applications for this type of model, from monitoring industrial machinery to recognizing voice commands. Now that you've trained your model you can integrate your impulse in the firmware of your own embedded device, see Running your impulse locally. There are examples for Mbed OS, Arduino, STM32CubeIDE, Zephyr, and any other target that supports a C++ compiler.
Or if you're interested in more, see our tutorials on Continuous motion recognition or Adding sight to your sensors. If you have a great idea for a different project, that's fine too. Edge Impulse lets you capture data from any sensor, build custom processing blocks to extract features, and you have full flexibility in your Machine Learning pipeline with the learning blocks.
We can't wait to see what you'll build! 🚀
In this tutorial, you'll use machine learning to build a gesture recognition system that runs on a microcontroller. This is a hard task to solve using rule-based programming, as people don't perform gestures in the exact same way every time. But machine learning can handle these variations with ease. You'll learn how to collect high-frequency data from real sensors, use signal processing to clean up data, build a neural network classifier, and how to deploy your model back to a device. At the end of this tutorial, you'll have a firm understanding of applying machine learning in embedded devices using Edge Impulse.
There is also a video version of this tutorial:
You can view the finished project, including all data, signal processing and machine learning blocks here: Tutorial: continuous motion recognition.
For this tutorial, you'll need a supported device.
Alternatively, use the either Data forwarder or Edge Impulse for Linux SDK to collect data from any other development board, or your mobile phone.
If your device is connected (green dot) under Devices in the studio you can proceed:
Data ingestion
Edge Impulse can ingest data from many sources and any device - including embedded devices that you already have in production. See the documentation for the Data acquisition for more information.
With your device connected, we can collect some data. In the studio go to the Data acquisition tab. This is the place where all your raw data is stored, and - if your device is connected to the remote management API - where you can start sampling new data.
Under Record new data, select your device, set the label to updown
, the sample length to 10000
, the sensor to Built-in accelerometer
and the frequency to 62.5Hz
. This indicates that you want to record data for 10 seconds, and label the recorded data as updown
. You can later edit these labels if needed.
After you click Start sampling move your device up and down in a continuous motion. In about twelve seconds the device should complete sampling and upload the file back to Edge Impulse. You see a new line appear under 'Collected data' in the studio. When you click it you now see the raw data graphed out. As the accelerometer on the development board has three axes you'll notice three different lines, one for each axis.
Continuous movement
It's important to do continuous movements as we'll later slice up the data in smaller windows.
Machine learning works best with lots of data, so a single sample won't cut it. Now is the time to start building your own dataset. For example, use the following four classes, and record around 3 minutes of data per class:
Idle - just sitting on your desk while you're working.
Snake - moving the device over your desk as a snake.
Wave - waving the device from left to right.
Updown - moving the device up and down.
Variations
Make sure to perform variations on the motions. E.g. do both slow and fast movements and vary the orientation of the board. You'll never know how your user will use the device. It's best to collect samples of ~10 seconds each.
Prebuilt dataset
Alternatively, you can load an example test set that has about ten minutes of data in these classes (but how much fun is that?). See the Continuous gestures dataset for more information.
With the training set in place, you can design an impulse. An impulse takes the raw data, slices it up in smaller windows, uses signal processing blocks to extract features, and then uses a learning block to classify new data. Signal processing blocks always return the same values for the same input and are used to make raw data easier to process, while learning blocks learn from past experiences.
For this tutorial we'll use the 'Spectral analysis' signal processing block. This block applies a filter, performs spectral analysis on the signal, and extracts frequency and spectral power data. Then we'll use a 'Neural Network' learning block, that takes these spectral features and learns to distinguish between the four (idle, snake, wave, updown) classes.
In the studio go to Create impulse, set the window size to 2000
(you can click on the 2000 ms.
text to enter an exact value), the window increase to 80
, and add the 'Spectral Analysis' and 'Classification (Keras)' blocks. Then click Save impulse.
To configure your signal processing block, click Spectral features in the menu on the left. This will show you the raw data on top of the screen (you can select other files via the drop down menu), and the results of the signal processing through graphs on the right. For the spectral features block you'll see the following graphs:
Filter response - If you have chosen a filter (with non zero order), this will show you the response across frequencies. That is, it will show you how much each frequency will be attenuated.
After filter - the signal after applying the filter. This will remove noise.
Spectral power - the frequencies at which the signal is repeating (e.g. making one wave movement per second will show a peak at 1 Hz).
See the dedicated page for the Spectral features pre-processing block.
A good signal processing block will yield similar results for similar data. If you move the sliding window (on the raw data graph) around, the graphs should remain similar. Also, when you switch to another file with the same label, you should see similar graphs, even if the orientation of the device was different.
Bonus exercise: filters
Try to reason about the filter parameters. What does the cut-off frequency control? And what do you see if you switch from a low-pass to a high-pass filter?
Set the filter to low pass with the following parameters:
Once you're happy with the result, click Save parameters. This will send you to the 'Feature generation' screen. In here you'll:
Split all raw data up in windows (based on the window size and the window increase).
Apply the spectral features block on all these windows.
Calculate feature importance. We will use this later to set up the anomaly detection.
Click Generate features to start the process.
Afterward the 'Feature explorer' will load. This is a plot of all the extracted features against all the generated windows. You can use this graph to compare your complete data set. A good rule of thumb is that if you can visually identify some clusters by classes, then the machine learning model will be able to do so as well.
With all data processed it's time to start training a neural network. Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. The network that we're training here will take the signal processing data as an input, and try to map this to one of the four classes.
So how does a neural network know what to predict? A neural network consists of layers of neurons, all interconnected, and each connection has a weight. One such neuron in the input layer would be the height of the first peak of the X-axis (from the signal processing block); and one such neuron in the output layer would be wave
(one the classes). When defining the neural network all these connections are initialized randomly, and thus the neural network will make random predictions. During training, we then take all the raw data, ask the network to make a prediction, and then make tiny alterations to the weights depending on the outcome (this is why labeling raw data is important).
This way, after a lot of iterations, the neural network learns; and will eventually become much better at predicting new data. Let's try this out by clicking on NN Classifier in the menu.
See the dedicated page for the Classification (Keras) learning block.
Set 'Number of training cycles' to 1
. This will limit training to a single iteration. And then click Start training.
Now change 'Number of training cycles' to 2
and you'll see performance go up. Finally, change 'Number of training cycles' to 30
and let the training finish.
You've just trained your first neural networks!
100% accuracy
You might end up with 100% accuracy after training for 100 training cycles. This is not necessarily a good thing, as it might be a sign that the neural network is too tuned for the specific test set and might perform poorly on new data (overfitting). The best way to reduce this is by adding more data or reducing the learning rate.
From the statistics in the previous step we know that the model works against our training data, but how well would the network perform on new data? Click on Live classification in the menu to find out. Your device should (just like in step 2) show as online under 'Classify new data'. Set the 'Sample length' to 10000
(10 seconds), click Start sampling and start doing movements. Afterward, you'll get a full report on what the network thought you did.
If the network performed great, fantastic! But what if it performed poorly? There could be a variety of reasons, but the most common ones are:
There is not enough data. Neural networks need to learn patterns in data sets, and the more data the better.
The data does not look like other data the network has seen before. This is common when someone uses the device in a way that you didn't add to the test set. You can add the current file to the test set by clicking ⋮
, then selecting Move to training set. Make sure to update the label under 'Data acquisition' before training.
The model has not been trained enough. Up the number of epochs to 200
and see if performance increases (the classified file is stored, and you can load it through 'Classify existing validation sample').
The model is overfitting and thus performs poorly on new data. Try reducing the learning rate or add more data.
The neural network architecture is not a great fit for your data. Play with the number of layers and neurons and see if performance improves.
As you see there is still a lot of trial and error when building neural networks, but we hope the visualizations help a lot. You can also run the network against the complete validation set through 'Model validation'. Think of the model validation page as a set of unit tests for your model!
With a working model in place, we can look at places where our current impulse performs poorly.
Neural networks are great, but they have one big flaw. They're terrible at dealing with data they have never seen before (like a new gesture). Neural networks cannot judge this, as they are only aware of the training data. If you give it something unlike anything it has seen before it'll still classify as one of the four classes.
Let's look at how this works in practice. Go to 'Live classification' and record some new data, but now vividly shake your device. Take a look and see how the network will predict something regardless.
So, how can we do better? If you look at the feature explorer, you should be able to visually separate the classified data from the training data. We can use this to our advantage by training a new (second) network that creates clusters around data that we have seen before, and compares incoming data against these clusters. If the distance from a cluster is too large you can flag the sample as an anomaly, and not trust the neural network.
To add this block go to Create impulse, click Add learning block, and select 'Anomaly Detection (K-Means)'. Then click Save impulse.
To configure the clustering model click on Anomaly detection in the menu. Here we need to specify:
The number of clusters. Here use 32
.
The axes that we want to select during clustering. Click on the Select suggested axes button to harness the results of the feature importance output. Alternatively, the data separates well on the accX RMS, accY RMS and accZ RMS axes, you can also include these axes.
See the dedicated page for the anomaly detection (K-means) learning block. We also provide the anomaly detection (GMM) learning block that is compatible with this tutorial.
Click Start training to generate the clusters. You can load existing validation samples into the anomaly explorer with the dropdown menu.
Axes
The anomaly explorer only plots two axes at the same time. Under 'average axis distance' you see how far away from each axis the validation sample is. Use the dropdown menu's to change axes.
If you now go back to 'Live classification' and load your last sample, it should now have tagged everything as anomaly. This is a great example where signal processing (to extract features), neural networks (for classification) and clustering algorithms (for anomaly detection) can work together.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption. Edge Impulse can package up the complete impulse - including the signal processing code, neural network weights, and classification code - up in a single C++ library that you can include in your embedded software.
Mobile phone
Your mobile phone can build and download the compiled impulse directly from the mobile client. See 'Deploying back to device' on the Using your mobile phone page.
To export your model, click on Deployment in the menu. Then under 'Build firmware' select your development board, and click Build. This will export the impulse, and build a binary that will run on your development board in a single step. After building is completed you'll get prompted to download a binary. Save this on your computer.
When you click the Build button, you'll see a pop-up with text and video instructions on how to deploy the binary to your particular device. Follow these instructions. Once you are done, we are ready to test your impulse out.
We can connect to the board's newly flashed firmware over serial. Open a terminal and run:
Serial daemon
If the device is not connected over WiFi, but instead connected via the Edge Impulse serial daemon, you'll need stop the daemon. Only one application can connect to the development board at a time.
This will sample data from the sensor, run the signal processing code, and then classify the data:
Continuous movement
We trained a model to detect continuous movement in 2 second intervals. Thus, changing your movement while sampling will yield incorrect results. Make sure you've started your movement when 'Sampling...' gets printed. In between sampling, you have two seconds to switch movements.
To run the continuous sampling, run the following command:
Victory! You've now built your first on-device machine learning model.
Congratulations! You have used Edge Impulse to train a machine learning model capable of recognizing your gestures and understand how you can build models that classify sensor data or find anomalies. Now that you've trained your model you can integrate your impulse in the firmware of your own embedded device, see Running your impulse locally. There are examples for Mbed OS, Arduino, STM32CubeIDE, and any other target that supports a C++ compiler.
Or if you're interested in more, see our tutorials on Recognize sounds from audio or Adding sight to your sensors. If you have a great idea for a different project, that's fine too. Edge Impulse lets you capture data from any sensor, build custom processing blocks to extract features, and you have full flexibility in your Machine Learning pipeline with the learning blocks.
We can't wait to see what you'll build! 🚀
This page is part of and describes how you can use the OpenMV Cam H7 Plus to build a dataset, and import the data into Edge Impulse.
To set up your OpenMV camera, and collect some data:
Install the .
Follow the to clean the sensor and focus the lens.
Connect a micro-USB cable to the camera, and open the OpenMV IDE. The camera should automatically update to the latest firmware.
Verify that the camera can capture live images, by clicking on the Connect button in the bottom left corner, then pressing Play to run the application.
A live feed from your camera will be displayed in the top right corner of the IDE.
Once your camera is up and running, it's time to start capturing some images and build our dataset.
First, set up a new dataset via Tools -> Dataset Editor, select New Dataset.
This opens the 'Dataset editor' panel on the left side, and the 'dataset capture script' in the main panel of the IDE. Here, create three classes: "plant", "lamp" and "unknown". It's important to add an unknown class that contains random images which are neither lamps nor plants.
As we'll build a model that takes in square images, change the 'Dataset capture script' to read:
Now you can capture data for the three classes.
Click the Play icon to run the 'dataset capture script' on your OpenMV camera.
Select one of the classes by clicking on the folder name in the 'Dataset editor'.
Take a snap by clicking the Capture data (camera icon) button.
Do this until you have captured 30 images per class from a variety of angles. Also make sure to vary the things you capture for the unknown class.
To import the dataset into Edge Impulse go to Tools > Dataset Editor > Export > Upload to Edge Impulse project.
Then, choose the project name, and the split between training and testing data (recommended to keep this to 80/20).
A duplicate check runs when you upload new data, so you can upload your dataset multiple times (for example, when you've added new files) without adding the same data twice.
Training and testing data split
The split between training and testing data is based on the hash of the file in order to have a deterministic process. As a consequence you may not have a perfect 80/20 split between training and testing, but this process ensures samples are always placed in the same category.
Our dataset now appears under the Data acquisition section of our project.
This page is part of and describes how you can use your mobile phone to import image data into Edge Impulse.
To add your phone to your project, go to the Devices page, select Connect a new device and select Use your mobile phone. A QR code will pop up. Scan this code with your phone and your phone will pop up on the devices screen.
With your phone connected to your project, it's time to start capturing some images and build our dataset. We have a special UI for collecting images quickly, on your phone choose Collecting images?.
On your phone a permission prompt will show up, and then the viewfinder will be displayed. Set the label (in the top corner) to 'lamp', point your camera at your lamp and press Capture.
Afterwards the photo shows up in the studio on the Data acquisition page.
Do this until you have captured 30 images per class from a variety of angles. Also make sure to vary the things you capture for the unknown class.
In this tutorial, you'll use machine learning to build a system that can recognize and track multiple objects in your house through a camera - a task known as object detection. Adding sight to your embedded devices can make them see the difference between poachers and elephants, count objects, find your lego bricks, and detect dangerous situations. In this tutorial, you'll learn how to collect images for a well-balanced dataset, how to apply transfer learning to train a neural network and deploy the system to an edge device.
At the end of this tutorial, you'll have a firm understanding of how to do object detection using Edge Impulse.
There is also a video version of this tutorial:
Running on a microcontroller?
In this tutorial we'll build a model that can distinguish between two objects on your desk - we've used a lamp and a coffee cup, but feel free to pick two other objects. To make your machine learning model see it's important that you capture a lot of example images of these objects. When training the model these example images are used to let the model distinguish between them.
Capturing data
Capture the following amount of data - make sure you capture a wide variety of angles and zoom level. It's fine if both images are in the same frame. We'll be cropping the images later to be square so make sure the objects are in the frame.
30 images of a lamp.
30 images of a coffee cup.
You can collect data from the following devices:
Or you can capture your images using another camera, and then upload them by going to Data acquisition and clicking the 'Upload' icon.
With the data collected we need to label this data. Go to Data acquisition, verify that you see your data, then click on the 'Labeling queue' to start labeling.
No labeling queue? Go to Dashboard, and under 'Project info > Labeling method' select 'Bounding boxes (object detection)'.
Labeling data
The labeling queue shows you all the unlabeled data in your dataset. Labeling your objects is as easy as dragging a box around the object, and entering a label. To make your life a bit easier we try to automate this process by running an object tracking algorithm in the background. If you have the same object in multiple photos we thus can move the boxes for you and you just need to confirm the new box. After dragging the boxes, click Save labels and repeat this until your whole dataset is labeled.
AI-Assisted Labeling
Afterwards you should have a well-balanced dataset listed under Data acquisition in your Edge Impulse project.
Rebalancing your dataset
To validate whether a model works well you want to keep some data (typically 20%) aside, and don't use it to build your model, but only to validate the model. This is called the 'test set'. You can switch between your training and test sets with the two buttons above the 'Data collected' widget. If you've collected data on your development board there might be no data in the testing set yet. You can fix this by going to Dashboard > Perform train/test split.
With the training set in place you can design an impulse. An impulse takes the raw data, adjusts the image size, uses a preprocessing block to manipulate the image, and then uses a learning block to classify new data. Preprocessing blocks always return the same values for the same input (e.g. convert a color image into a grayscale one), while learning blocks learn from past experiences.
In the studio go to Create impulse, set the image width and image height to 320
, the 'resize mode' to Fit shortest axis
and add the 'Images' and 'Object Detection (Images)' blocks. Then click Save impulse.
Configuring the processing block
To configure your processing block, click Images in the menu on the left. This will show you the raw data on top of the screen (you can select other files via the drop down menu), and the results of the processing step on the right. You can use the options to switch between 'RGB' and 'Grayscale' mode, but for now leave the color depth on 'RGB' and click Save parameters.
This will send you to the 'Feature generation' screen. In here you'll:
Resize all the data.
Apply the processing block on all this data.
Create a 3D visualization of your complete dataset.
Click Generate features to start the process.
Afterwards the 'Feature explorer' will load. This is a plot of all the data in your dataset. Because images have a lot of dimensions (here: 320x320x3=307,200 features) we run a process called 'dimensionality reduction' on the dataset before visualizing this. Here the 307,200 features are compressed down to just 3, and then clustered based on similarity. Even though we have little data you can already see the clusters forming (lamp images are all on the left, coffee all on the right), and can click on the dots to see which image belongs to which dot.
Configuring the transfer learning model
With all data processed it's time to start training a neural network. Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. The network that we're training here will take the image data as an input, and try to map this to one of the three classes.
It's very hard to build a good working computer vision model from scratch, as you need a wide variety of input data to make the model generalize well, and training such models can take days on a GPU. To make this easier and faster we are using transfer learning. This lets you piggyback on a well-trained model, only retraining the upper layers of a neural network, leading to much more reliable models that train in a fraction of the time and work with substantially smaller datasets.
To configure the transfer learning model, click Object detection in the menu on the left. Here you can select the base model (the one selected by default will work, but you can change this based on your size requirements), and set the rate at which the network learns.
Leave all settings as-is, and click Start training. After the model is done you'll see accuracy numbers below the training output. You have now trained your model!
With the model trained let's try it out on some test data. When collecting the data we split the data up between a training and a testing dataset. The model was trained only on the training data, and thus we can use the data in the testing dataset to validate how well the model will work in the real world. This will help us ensure the model has not learned to overfit the training data, which is a common occurrence.
To validate your model, go to Model testing and select Classify all. Here we hit 92.31% precision, which is great for a model with so little data.
To see a classification in detail, click the three dots next to an item, and select Show classification. This brings you to the Live classification screen with much more details on the file (you can also capture new data directly from your development board from here). This screen can help you determine why items were misclassified.
Live Classification Result
This view is particularly useful for a direct comparison between the raw image and the model's interpretation. Each object detected in the image is highlighted with a bounding box. Alongside these boxes, you'll find labels and confidence scores, indicating what the model thinks each object is and how sure it is about its prediction. This mode is ideal for understanding the model's performance in terms of object localization and classification accuracy.
Overlay Mode for the Live Classification Result
In this view, bounding boxes are drawn around the detected objects, with labels and confidence scores displayed within the image context. This approach offers a clearer view of how the bounding boxes align with the objects in the image, making it easier to assess the precision of object localization. The overlay view is particularly useful for examining the model's ability to accurately detect and outline objects within a complex visual scene.
Summary Table
Name: This field displays the name of the sample file analyzed by the model. For instance, 'sample.jpg.22l74u4f' is the file name in this case.
CATEGORY: Lists the types of objects that the model has been trained to detect. In this example, two categories are shown: 'coffee' and 'lamp'.
COUNT: Indicates the number of times each category was detected in the sample file. In this case, both 'coffee' and 'lamp' have a count of 1, meaning each object was detected once in the sample.
INFO: This column provides additional information about the model's performance. It displays the 'Precision score', which, in this example, is 95.00%. The precision score represents the model's accuracy in making correct predictions over a range of Intersection over Union (IoU) values, known as the mean Average Precision (mAP).
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption. Edge Impulse can package up the complete impulse - including the preprocessing steps, neural network weights, and classification code - in a single C++ library or model file that you can include in your embedded software.
Running the impulse on your Raspberry Pi 4 or Jetson Nano
From the terminal just run edge-impulse-linux-runner
. This will build and download your model, and then run it on your development board. If you're on the same network you can get a view of the camera, and the classification results directly from your dev board. You'll see a line like:
Open this URL in a browser to see your impulse running!
Running the impulse on your mobile phone
On your mobile phone just click Switch to classification mode at the bottom of your phone screen. Point it at an object and press 'Capture'.
Integrating the model in your own application
We can't wait to see what you'll build! 🚀
Neural networks are not limited to working with one type of data at a time. One of their biggest advantages is that they are incredibly flexible with the type of input data, so long as the format and ordering of that data stays the same from training to inference. As a result, we can use them to perform sensor fusion for a variety of tasks.
Sensor fusion is the process of combining data from different types of sensors or similar sensors mounted in different locations, which gives us more information to make decisions and classifications. For example, you could use temperature data with accelerometer data to get a better idea of a potential anomaly!
In this tutorial, you will learn how to use Edge Impulse to perform sensor fusion on the Arduino Nano 33 BLE Sense.
Example Project: You can find the dataset and impulse used throughout this tutorial in .
Multimodal vs Multi-impulse vs multi-model vs sensor fusion
Multimodal: When discussing sensor fusion, it's important to also understand multimodal models. These models integrate multiple types of data (modalities) such as text, images, audio, and video. By combining these diverse data sources, multimodal models can extract richer features and improve overall model performance. This is similar to sensor fusion but extends beyond sensor data to any type of data that can provide complementary information. This integration helps in creating more robust and versatile AI systems capable of understanding and predicting complex scenarios.
Running multi-impulse refers to running two separate projects (different data, different DSP blocks and different models) on the same target. It will require modifying some files in the EI-generated SDKs. Can be multimodal. Since it involves running multiple separate projects with different data and models, it can handle different types of data, making it potentially multimodal. See the
Running multi-model refers to running two different models (same data, same DSP block but different tflite models) on the same target. It can become multimodal if the models are handling different types of data. See how to run a motion classifier model and an anomaly detection model on the same device in .
Sensor fusion refers to the process of combining data from different types of sensors to give more information to the neural network. To extract meaningful information from this data, you can use the same DSP block (like in this tutorial), multiples DSP blocks, or use neural networks embeddings. Sensor fusion can be considered a form of multimodal integration because it involves combining data from different sensors, which can be seen as different modalities within the sensor data domain. See an example of Sensor fusion in the following tutorial tutorial.
For this tutorial, you'll need a .
For this demo, we'll show you how to identify different environments by using a fusion of temperature, humidity, pressure, and light data. In particular, I'll have the Arduino board identify different rooms in my house as well as outside. Note that the we assume that the environment is static--if I turn out lights or the outside temperature changes, the model will not work. However, it demonstrates how we can combine different sensor data with machine learning to do classification!
As we will be collecting data from our Arduino board connected to a computer, it helps to have a laptop that you can move to different rooms.
Create a new project on the Edge Impulse studio.
Connect the Arduino Nano 33 BLE to your computer. Follow the to upload the Edge Impulse firmware to the board and connect it to your project.
Go to Data acquisition. Under Record new data, select your device and set the label to bedroom
. Change Sensor to Environmental + Interactional
, set the Sample length to 10000
ms and Frequency to 12.5Hz
.
Stand in one of your rooms with your Arduino board (and laptop). Click Start sampling and slowly move the board around while data is collected. After sampling is complete, you should see a new data plot with a different line for each sensor.
Variations
Try to stand in different parts of each room while collecting data.
Repeat this process to record about 3 minutes of data for the bedroom class. Try to stand in a different spot in the room while collecting data--we want a robust dataset that represents the features of each room. Head to another room and repeat data collection. Continue doing this until you have around 3 minutes of data for each of the following classes:
Bedroom
Hallway
Outside
You are welcome to try other rooms or locations. For this demo, I found that my bedroom, kitchen, and living room all exhibited similar environmental and lighting properties, so the model struggled to tell them apart.
Head to Dashboard and scroll down to Danger zone. Click Perform train/test split and follow the instructions in the pop-up window to split your dataset into training and testing groups. When you're done, you can head back to Data acquisition to see that your dataset has been split. You should see about 80% of your samples in Training data and about 20% in Test data.
Head to Create impulse. Change the Window increase to 500 ms
. Add a Flatten block. Notice that you can choose which environmental and interactional sensor data to include. Deselect proximity and gesture, as we won't need those to detect rooms. Add a Classification (Keras) learning block
Click Save impulse.
Head to Flatten. You can select different samples and move the window around to see what the DSP result will look like for each set of features to be sent to the learning block.
The Flatten block will compute the average, minimum, maximum, root-mean square, standard deviation, skewness, and kurtosis of each axis (e.g. temperature, humidity, brightness, etc.). With 7 axes and 7 features computed for each axis, that gives us 49 features for each window being sent to the learning block. You can see these computed features under Processed features.
Click Save parameters. On the next screen, select Calculate feature importance and click Generate features.
After a few moments, you should be able to explore the features of your dataset to see if your classes are easily separated into categories.
Interestingly enough, it looks like temperature and red light values were the most important features in determining the location of the Arduino board.
With our dataset collected and features processed, we can train our machine learning model. Click on NN Classifier. Change the Number of training cycles to 300
and click Start training. We will leave the neural network architecture as the default for this demo.
During training, parameters in the neural network's neurons are gradually updated so that the model will try to guess the class of each set of data as accurately as possible. When training is complete, you should see a Model panel appear on the right side of the page.
The Confusion matrix gives you an idea of how well the model performed at classifying the different sets of data. The top row gives the predicted label and the column on the left side gives the actual (ground-truth) label. Ideally, the model should predict the classes correctly, but that's not always the case. You want the diagonal cells from the top-left to the bottom-right to be as close to 100% as possible.
The On-device performance provides some statistics about how the model will likely run on a particular device. By default, an Arm Cortex-M4F running at 80 MHz is assumed to be your target device. The actual memory requirements and run time may vary on different platforms.
Rather than simply assume that our model will work when deployed, we can run inference on our test dataset as well as on live data.
First, head to Model testing, and click Classify all. After a few moments, you should see results from your test set.
You can click on the three dots next to an item and select Show classification. This will give you a classification result screen where you can see results information in more detail.
Additionally, we can test the impulse in a real-world environment to make sure the model has not overfit the training data. To do that, head to Live classification. Make sure your device is connected to the Studio and that the Sensor, Sample length, and Frequency match what we used to initially capture data.
Click Start sampling. A new sample will be captured from your board, uploaded, and classified. Once complete, you should see the classification results.
In the example above, we sampled 10 seconds of data from the Arduino. This data is split into 1-second windows (the window moves over 0.5 seconds each time), and the data in that window is sent to the DSP block. The DSP block computes the 49 features that are then sent to the trained machine learning model, which performs a forward pass to give us our inference results.
As you can see, the inference results from all of the windows claimed that the Arduino board was in the bedroom, which was true! This is great news for our model--it seems to work even on unseen data.
Now that we have an impulse with a trained model and we've tested its functionality, we can deploy the model back to our device. This means the impulse can run locally without an internet connection to perform inference!
Edge Impulse can package up the entire impulse (preprocessing block, neural network, and classification code) into a single library that you can include in your embedded software.
Click on Deployment in the menu. Select the library that you would like to create, and click Build at the bottom of the page.
Running your impulse locally
Well done! You've trained a neural network to determine the location of a development board based on a fusion of several sensors working in tandem. Note that this demo is fairly limited--as the daylight or temperature changes, the model will no longer be valid. However, it hopefully gives you some ideas about how you can mix and match sensors to achieve your machine learning goals.
We can't wait to see what you'll build! 🚀
Sensor fusion is about combining data from various sensors to gain a more comprehensive understanding of your environment. In this tutorial, we will demonstrate sensor fusion by bringing together high-dimensional audio or image data with time-series sensor data. This combination allows you to extract deeper insights from your sensor data.
This is an advanced tutorial where you will need to parse your dataset to create multi-sensor data samples, train several Edge Impulse project in order to extract the embeddings from the tflite
models, create and, finally, modify the C++ inferencing SDK.
If you are looking for a more beginner-level tutorial, please head to the tutorial.
Multi-impulse vs multi-model vs sensor fusion
Running multi-impulse refers to running two separate projects (different data, different DSP blocks and different models) on the same target. It will require modifying some files in the EI-generated SDKs.
Running multi-model refers to running two different models (same data, same DSP block but different tflite models) on the same target. See how to run a motion classifier model and an anomaly detection model on the same device in .
Sensor fusion refers to the process of combining data from different types of sensors to give more information to the neural network. To extract meaningful information from this data, you can use the same DSP block, multiples DSP blocks, or use neural networks embeddings like we will see in this tutorial.
Also, see this video (starting min 13):
When you have data coming from multiple sources, such as a microphone capturing audio, a camera capturing images, and sensors collecting time-series data. Integrating these diverse data types can be tricky and conventional methods fall short.
With the standard workflow, if you have data streams from various sources, you might want to create separate DSP blocks for each data type. For instance, if you're dealing with audio data from microphones, image data from cameras, and time-series sensor data from accelerometers, you could create separate DSP blocks for each. For example:
A spectrogram-based DSP block for audio data
An image DSP block for image data
A spectral analysis block for time-series sensor data
This approach initially seems logical but comes with limitations:
When using separate DSP blocks, you're constrained in your choice of neural networks. The features extracted from each data type are fundamentally different. For example, a pixel in an image or an image's spectrogram and a data point from an accelerometer's time-series data have distinct characteristics. This incompatibility makes it challenging to use a convolutional neural network (CNN) that is typically effective for image data or spectrogram. As a result, fully connected networks may be your only option, which are not ideal for audio or image data.
To bypass the limitation stated above, you may consider using neural networks embeddings. In essence, embeddings are compact, meaningful representations of your data, learned by a neural network.
While training the neural network, the model try to find the mathematical formula that best maps the input to the output. This is done by tweaking each neuron (each neuron is a parameter in our formula). The interesting part is that each layer of the neural network will start acting like a feature extracting step but highly tuned for your specific data.
Finally, instead of having a classifier for last layer (usually a softmax
layer), we cut the neural network somewhere at the end and we obtained the embeddings.
Thus, we can consider the embeddings as learnt features and we will pass these "features" to the final Impulse:
Here's how we approach advanced sensor fusion with Edge Impulse.
In this workflow, we will show how to perform sensor fusion using both audio data and accelerometer data to classify different stages of a grinding coffee machine (grind
, idle
, pump
and extract
). First, we are going to use a spectrogram DSP block and a NN classifier using two dense network. This first impulse will then be used to generate the embeddings and will be made available in a custom DSP block. Finally, we are going to train a fully connected layer using features coming from both the generated embeddings and a spectral feature DSP block.
We have develop two Edge Impulse public projects, one publicly available dataset and a Github repository containing the source code to help you follow the steps:
Please note that with a few changes, you will be able to change the sensor type (audio to images) or the first pre-processing method (spectrogram to MFE/MFCC).
The first step is to have input data samples that contain both sensors. In Edge Impulse studio, you can easily visualize time-series data, like audio and accelerometer data.
Note: it is not trivial to group together images and time-series. Our core-engineering team is working on improving this workflow. In the meantime, as a workaround, you can encode your image as time-series with one axis per channel (red, green, blue) plus the sensor:
Train separate projects for high dimensional data (audio or image data). Each project contains both a DSP block and a trained neural network.
Clone this repository:
Download the generated Impulse to extract the embeddings, which encapsulate distilled knowledge about their respective data types.
Download Test Data: From the same dashboard, download the test or train data NPY file (input.npy
). Place this numpy array file under the /input
repository. This will allow us to generate a quantized version of the tflite embeddings. Ideally choose the test data if you have some data available.
Generate the embeddings:
This will cut off the last layer of the neural network and convert it to TensorFlow Lite (TFLite) format. You can follow the process outlined in the saved_model_to_embeddings.py
script for this conversion for a better understanding.
To make sensor fusion work seamlessly, Edge Impulse enables you to create custom DSP blocks. These blocks combine the necessary spectrogram/image-processing and neural network components for each data type.
Custom DSP Block Configuration: In the DSP block, perform two key operations as specified in the dsp.py
script:
Run the DSP step with fixed parameters.
Run the neural network.
Replace this following lines in dsp-blocks/features-from-audio-embeddings/dsp.py
to match your DSP configuration:
Return Neural Network Embeddings: The DSP block should be configured to return the neural network embeddings, as opposed to the final classification result.
Implement get_tflite_implementation: Ensure that the get_tflite_implementation
function returns the TFLite model. Note that the on-device implementation will not be correct initially when generating the C++ library, as only the neural network part is compiled. We will fix this in the final exported C++ Library.
Now publish your new custom DSP block.
Fill the necessary information and push your block:
Multiple DSP Blocks: Create a new impulse with three DSP blocks and a classifier block. The routing should be as follows:
Audio data routed through the custom block.
Sensor data routed through spectral analysis.
Training the Model: Train the model within the new impulse, using a fully-connected network.
Export as a C++ Library:
In the Edge Impulse platform, export your project as a C++ library.
Choose the model type that suits your target device (quantized
vs. float32
).
Make sure to select EON compiler option
Copy the exported C++ library to the example-cpp
folder for easy access.
Add a Forward Declaration:
In the model-parameters/model_variables.h
file of the exported C++ library, add a forward declaration for the custom DSP block you created.
For example:
And change &extract_tflite_eon_features
into &custom_sensor_fusion_features
in the ei_dsp_blocks
object.
Implement the Custom DSP Block:
In the main.cpp
file of the C++ library, implement the custom_sensor_fusion_features block. This block should:
Call into the Edge Impulse SDK to generate features.
Execute the rest of the DSP block, including neural network inference.
Copy a test sample's raw features into the features[]
array in source/main.cpp
Enter make -j
in this directory to compile the project. If you encounter any OOM memory error try make -j4
(replace 4 with the number of cores available)
Enter ./build/app
to run the application
Compare the output predictions to the predictions of the test sample in the Edge Impulse Studio.
Note that if you are using the quantized version of the model, you may encounter a slight difference between the Studio Live Classification page and the above results, the float32 model however should give you the same results.
is a brand-new approach to run object detection models on constrained devices. FOMO is a ground-breaking algorithm that brings real-time object detection, tracking and counting to microcontrollers for the first time. FOMO is 30x faster than MobileNet SSD and can run in <200K of RAM.
In this tutorial, we will explain how to count cars to estimate parking occupancy using FOMO.
View the finished project, including all data, signal processing and machine learning blocks here: .
Limitations of FOMO
FOMO does not output bounding boxes but will give you the object's location using centroids. Hence the size of the object is not available.
FOMO works better if the objects have a similar size.
Objects shouldn’t be too close to each other, although this can be optimized when increasing the image input resolution.
If you need the size of the objects for your project, head to the default . tutorial.
For this tutorial, you'll need a .
If you don't have any of these devices, you can also upload an existing dataset through the or use your to connect your device to Edge Impulse. After this tutorial, you can then deploy your trained machine learning model as a C++ library or as a WebAssembly package and run it on your device.
You can collect data from the following devices:
- for the Raspberry Pi 4 and the Jetson Nano.
Collecting image data from any of the that have a camera.
With the data collected, we need to label this data. Go to Data acquisition, verify that you see your data, then click on the 'Labeling queue' to start labeling.
Why use bounding box inputs?
To keep the interoperability with other models, your training image input will use bounding boxes although we will output centroids in the inference process. As such FOMO will use in the background translation between bounding boxes and segmentation maps in various parts of the end-to-end flow. This includes comparing sets between the bounding boxes and the segmentation maps to run profiling and scoring.
Using your own trained model - Useful when you already have a trained model with classes similar to your new task.
Using Object tracking - Useful when you have objects that are similar in size and common between images/frames.
For our case, since the 'car' object is part of the COCO dataset, we will use the YoloV5 pre-trained model to accelerate this process. To enable this feature, we will first click the Label suggestions dropdown,then select “Classify using YOLOv5.”
From the image above, the YOLOV5 model can already help us annotate more than 90% of the cars without us having to do it manually by our hands.
To validate whether a model works well you want to keep some data (typically 20%) aside, and don't use it to build your model, but only to validate the model. This is called the 'test set'. You can switch between your training and test sets with the two buttons above the 'Data collected' widget. If you've collected data on your development board there might be no data in the testing set yet. You can fix this by going to Dashboard > Perform train/test split.
To configure this, go to Create impulse, set the image width and image height to 96, the 'resize mode' to Fit shortest axis and add the 'Images' and 'Object Detection (Images)' blocks. Then click Save Impulse.
To configure your processing block, click Images in the menu on the left. This will show you the raw data on top of the screen (you can select other files via the drop-down menu), and the results of the processing step on the right. You can use the options to switch between RGB
and Grayscale
modes. Finally, click on Save parameters.
This will send you to the 'Feature generation' screen. In here you'll:
Resize all the data.
Apply the processing block on all this data.
Create a 3D visualization of your complete dataset.
Click Generate features to start the process.
Afterward, the Feature explorer will load. This is a plot of all the data in your dataset. Because images have a lot of dimensions (here: 96x96x1=9216 features for grayscale) we run a process called 'dimensionality reduction' on the dataset before visualizing this. Here the 9216 features are compressed down to 2, and then clustered based on similarity as shown in the feature explorer below.
With all data processed it's time to start training our FOMO model. The model will take an image as input and output objects detected using centroids. For our case, it will show centroids of cars detected on the images.
FOMO is fully compatible with any MobileNetV2 model, and depending on where the model needs to run you can pick a model with a higher or lower alpha. Transfer learning also works (although you need to train your base models specifically with FOMO in mind). Another advantage of FOMO is that it has very few parameters to learn from compared to normal SSD networks making the network even much smaller and faster to train. Together this gives FOMO the capabilities to scale from the smallest microcontrollers to full gateways or GPUs.
To configure FOMO, head over to the ‘Object detection’ section, and select 'Choose a different model' then select one of the FOMO models as shown in the image below.
Make sure to start with a learning rate of 0.001 then click start training. After the model is done you'll see accuracy numbers below the training output. You have now trained your FOMO object detection model!
As you may have noticed from the training results above, FOMO uses F1 Score as its base evaluating metric as compared to SSD MobileNetV2 which uses Mean Average Precision (mAP). Using Mean Average Precision (mAP) as the sole evaluation metric can sometimes give limited insights into the model’s performance. This is particularly true when dealing with datasets with imbalanced classes as it only measures how accurate the predictions are without putting into account how good or bad the model is for each class. The combination between F1 score and a confusion matrix gives us both the balance between precision and recall of our model as well as how the model performs for each class.
With the model trained let's try it out on some test data. When collecting the data we split the data up between a training and a testing dataset. The model was trained only on the training data, and thus we can use the data in the testing dataset to validate how well the model will work in the real world. This will help us ensure the model has not learned to overfit the training data, which is a common occurrence. To validate our model, we will go to Model Testing and select Classify all.
Given the little training data we had and the few cycles we trained on, we got an accuracy of 84.62% which can be improved further. To see the classification in detail, we will head to Live Classification* and select one image from our test sample. Click the three dots next to an item, and select Show classification. We can also capture new data directly from your development board from here.
Live Classification Result
From the test image above, our model was able to detect 16 cars out of the actual possible 18 which is a good performance. This can be seen in side by side by default, but you can also switch to overlay mode to see the model's predictions against the actual image content.
Overlay Mode for the Live Classification Result
A display option where the original image and the model's detections overlap, providing a clear juxtaposition of the model's predictions against the actual image content.
Summary Table
The summary table for a FOMO classification result provides a concise overview of the model's performance on a specific sample file, such as 'Parking_data_2283.png.2tk8c1on'. This table is organized as follows:
CATEGORY: Metric, Object category, or class label, e.g., car. COUNT: Shows detection accuracy, frequency, e.g., car detected 7 times.
INFO: Provides performance metrics definitions, including F1 Score, Precision, and Recall, which offer insights into the model's accuracy and efficacy in detection:
Table Metrics F1 Score: (77.78%): Balances precision and recall. Precision: (100.00%): Accuracy of correct predictions. Recall: (63.64%): Proportion of actual objects detected.
Viewing Options
Bottom-right controls adjust the visibility of ground truth labels and model predictions, enhancing the analysis of the model's performance:
Prediction Controls: Customize the display of model predictions, including:
Show All: Show all detections and confidence scores.
Show Correct Only: Focus on accurate model predictions.
Show incorrect only: Pinpoint undetected objects in the ground truth.
Ground Truth Controls: Toggle the visibility of original labels for direct comparison with model predictions.
Show All: Display all ground truth labels.
Hide All: Conceal all ground truth labels.
Show detected only: Highlight ground truth labels detected by the model.
Show undetected only: Identify ground truth labels missed by the model.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption. Edge Impulse can package up the complete impulse - including the preprocessing steps, neural network weights, and classification code - in a single C++ library or model file that you can include in your embedded software.
From the terminal just run edge-impulse-linux-runner
. This will build and download your model, and then run it on your development board. If you're on the same network you can get a view of the camera, and the classification results directly from your dev board. You'll see a line like:
Open this URL in a browser to see your impulse running!
Go to the Deployment tab, on Build firmware section and select the board-compatible firmware to download it.
Follow the instruction provided to flash the firmware to your board and head over to your terminal and run the edge-impulse-run-impulse --debug
command:
You'll also see a URL you can use to view the image stream and results in your browser:
We can't wait to see what you'll build! 🚀
You can now go back to the tutorial to build your machine learning model.
Alternatively you can also capture your dataset directly through a different app, and then upload the data directly to Edge Impulse There are both options to do this visually (click the 'Upload' icon on the data acquisition screen), or via the CLI. You can find instructions here: . In this case it's highly recommended to you use square images, as the transfer learning model expects these; and you probably want to resize these images before uploading them to make sure training remains fast.
You can view the finished project, including all data, signal processing and machine learning blocks here: .
We recently released a brand-new approach to perform object detection tasks on microcontrollers, , if you are using a constraint device that does not have as much compute, RAM, and flash as Linux platforms, please head to this end-to-end tutorial:
Alternatively, if you only need to recognize a single object, you can follow our tutorial on - which performs image classification, hence, limits you to a single object but can also fit on microcontrollers.
You can view the finished project, including all data, signal processing and machine learning blocks here: .
For this tutorial, you'll need a .
If you don't have any of these devices, you can also upload an existing dataset through the - including . After this tutorial you can then deploy your trained machine learning model as a C++ library and run it on your device.
- for the Raspberry Pi 4 and the Jetson Nano.
Use AI-Assisted Labeling for your object detection project! For more information, .
For this tutorial we'll use the 'Images' preprocessing block. This block takes in the color image, optionally makes the image grayscale, and then turns the data into a features array. If you want to do more interesting preprocessing steps - like finding faces in a photo before feeding the image into the network -, see the tutorial. Then we'll use a 'Transfer Learning' learning block, which takes all the images in and learns to distinguish between the two ('coffee', 'lamp') classes.
Congratulations! You've added object detection to your sensors. Now that you've trained your model you can integrate your impulse in the firmware of your own edge device, see the documentation for the Node.js, Python, Go and C++ SDKs that let you do this in a few lines of code and make this model run on any device. when an object is seen.
Or if you're interested in more, see our tutorials on or . If you have a great idea for a different project, that's fine too. Edge Impulse lets you capture data from any sensor, build to extract features, and you have full flexibility in your Machine Learning pipeline with the learning blocks.
An impulse is a combination of preprocessing (DSP) blocks followed by machine learning blocks. It will slice up our data into smaller windows, use signal processing to extract features, and then train a machine learning model. Because we are using environmental and light data, which are slow-moving averages, we will use the for preprocessing.
You can also look at the Feature importance section to get an idea of which features are the most important in determining class membership. You can read more about feature importance .
If you see a lot of confusion between classes, it means you need to gather more data, try different features, use a different model architecture, or train for a longer period of time (more epochs). See to learn about ways to increase model performance.
See to learn how to deploy your impulse to a variety of platforms.
If you're interested in more, see our tutorials on or . If you have a great idea for a different project, that's fine too. Edge Impulse lets you capture data from any sensor, build to extract features, and you have full flexibility in your Machine Learning pipeline with the learning blocks.
Embeddings are super powerful, we use them for various features of Edge Impulse, such as the , the or in this advanced sensor fusion tutorial.
Dataset:
Edge Impulse project 1 (used to generate the embeddings):
Edge Impulse project 2 (final impulse):
Github repository containing the source code:
See
Download Model: From the project , download the TensorFlow SavedModel (saved_model
). Extract the save_model directory and place it under the /input
repository.
If you want to use another DSP block than the spectrogram one, all the source code of the available DSP code can be found in this public repository:
During development, it might be easier to host the block locally so you can make changes, see
See
For example, see the main.cpp file in the
Congratulations on successfully completing this advanced tutorial. You have been through the complex process of integrating high-dimensional audio or image data with time-series sensor data, employing advanced techniques like custom DSP blocks, neural network embeddings, and modifications to the C++ inferencing SDK. Also, note that you can simplify this workflow using to generate the custom DSP block with the embeddings.
If you are interested in using it for an enterprise project, please sign up for our FREE and our solution engineers can work with you on the integration.
Alternatively, you can capture your images using another camera, and then upload them directly from the studio by going to Data acquisition and clicking the 'Upload' icon or using Edge Impulse CLI .
All our collected images will be staged for annotation at the "labeling queue". Labeling your objects is as easy as dragging a box around the object, and entering a label. However, when you have a lot of images, this manual annotation method can become tiresome and time consuming. To make this task even easier, Edge impulse provides methods that can help you save time and energy. The AI assisted labeling techniques include:
Using YoloV5 - Useful when your objects are part of the common objects in the .
One of the beauties of FOMO is its fully convolutional nature, which means that just the ratio is set. Thus, it gives you more flexibility in its usage compared to the classical . method. For this tutorial, we have been using 96x96 images but it will accept other resolutions as long as the images are square.
To run using an Arduino library, go to the studio Deployment tab on Create Library section and select Arduino Library to download your custom Arduino library. Go to your Arduino IDE, then click on Sketch >> Include Library >> Add .Zip ( Your downloaded Arduino library). Make sure to follow the instruction provided on . Open Examples >> Examples from custom library and select your library. Upload the ''Portenta_H7_camera'' sketch to your Portenta then open your serial monitor to view results.
Congratulations! You've added object detection using FOMO to your sensors. Now that you've trained your model you can integrate your impulse in the firmware of your own edge device, see or the documentation for the Node.js, Python, Go and C++ SDKs that let you do this in a few lines of code and make this model run on any device.
when an object is seen.
Or if you're interested in more, see our tutorials on or . If you have a great idea for a different project, that's fine too. Edge Impulse lets you capture data from any sensor, build to extract features, and you have full flexibility in your Machine Learning pipeline with the learning blocks.