Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Some of the officially supported Edge Impulse boards require specific tutorial steps and thus separate documentation pages for their hardware; these tutorials are listed below:
Responding to your voice
This tutorial is for the Syntiant TinyML, Arduino Nicla Voice, and the Avnet RASynBoard (Renesas RA6 and Syntiant NDP 120). For other development boards, you can follow the standard Responding to your voice tutorial
In this tutorial, you'll use machine learning to build a system that can recognize audible events, particularly your voice through audio classification. The system you create will resemble giving audio commands to control a remote control (RC) car such as 'go' and 'stop', even in the presence of other background noise or background chatter. For the best user experience please use the Chrome browser.
You'll learn how to collect audio data from the microphone, use signal processing to extract the most important information, and train a deep neural network that can tell you whether your keyword was heard in a given clip of audio. Finally, you'll deploy the system to your Syntiant TinyML Board and evaluate how well it works.
At the end of this tutorial, you'll have a firm understanding of how to classify audio using Edge Impulse for Syntiant hardware.
Before starting the tutorial
After signing up for a free Edge Impulse account, clone the finished project, including all training data, signal processing and machine learning blocks here: Tutorial: Syntiant-RC. At the end of the process you will have the full project that comes pre-loaded with approx. 2.5 hours of training data.
The features have been generated for the Syntiant TinyML board. To use it for the Arduino Nicla Voice, just set the Features extractor
to log-bin
in the Syntiant DSP block and retrain your impulse.
Production ready systems
While the example Syntiant-RC tutorial project provides good performance, it is not intended to be production ready and is designed to maximize the out-of-box user experience. It is in essence a training vehicle for users of both Edge Impulse and Syntiant to understand the entire workflow. Production ready systems need to have a much more robust training data set to further reduce the likelihood of false positive and negatives.
For this tutorial you'll one of the following boards
The Syntiant TinyML Board shows up as USB microphone once plugged in, and Edge Impulse can use this interface to record audio directly. For the Arduino Nicla Voice, run the edge-impulse-daemon
CLI command to start collecting data.
Device compatibility
Edge Impulse can ingest data from any device - including embedded devices that you already have in production. See the documentation for the Ingestion service for more information.
In this tutorial we want to build a system that recognizes keywords that resemble giving commands to a remote control car such as 'go' and 'stop'. Although the aforementioned public project comes pre-loaded with approx. 2.5 hours of training data, in order to add additional audio samples including your own, we'll show you how you can record audio samples directly from the board.
To collect your own voice samples, ensure you have selected your system's microphone interface as the "Arduino MKRZero". Then, go to Devices -> Connect a new device, and choose the option of "Use Your Computer" and allow access to your microphone.
Set your keyword as the label and set your sample length to 10s. Then click Start Recording and start saying your keyword over and over again (with a slight pause in between each utterance).
Note: You can use your Mobile phone as a sensor as well.
Afterwards you have a file like this, clearly showing your keywords, separated by some noise. The new data sample will show up in the appropriate Training or Test data bucket.
This data is not suitable for Machine Learning yet though. You will need to cut out the parts where you say your keyword. This is important because you only want the actual keyword to be labeled as such, and not accidentally labeled as noise, or incomplete parts of the utterance. Fortunately the Edge Impulse Studio can do this for you. Click ⋮
next to your sample, and select Split sample.
If you have a short keyword, enable Shift samples to randomly shift the sample around in the window, and then click Split. You now have individual 1s. long samples in your dataset. Perfect!
Run the edge-impulse-daemon
CLI command and select the project to connect to. Once your board is connected, you can start collecting data from the Data Acquisition page:
About this section
This section goes through general guidance for collecting your audio data from scratch and not all steps are required for the out-of-box experience workflow, other than adding your own voice samples. Read on to understand more detail about how the dataset was constructed.
Now that you know how to collect keyword data let's consider other data we need to collect. In addition to your keyword we'll also need audio that is not your keyword. Like background noise, the TV playing, and humans saying other words, all of which go into the openset class. This class is labeled as and will be referred to as 'z_openset' from here on out. This is required because a machine learning model has no idea about right and wrong (unless those are your keywords), but only learns from the data you feed into it. The more varied your data is, the better your model will work.
For each of your classes (in this case 'go', 'stop', and 'z_openset') you want to capture an even amount of data (balanced datasets work better) - and for a decent keyword spotting model you'll want at the VERY minimum 10 minutes in each class if you're building your dataset from scratch. For the Syntiant-RC project we've used a subset of the 'Google 30' speech command set for 'go' and 'stop'.
To make this model more responsive to your own voice, do this in the same manner as above. One way of doing this would be to collect 10 seconds clips, then automatically split this data. Make sure to capture wide variations of the keyword and cover high and low pitches.
To add additional data to the 'z_openset' dataset label you can either collect this yourself, or make your life a bit easier by using dataset of both 'noise' (all kinds of background noise) and 'unknown' (random words) data that we built for you here: Pre-built datasets > Keyword spotting. In the Syntiant-RC project we used short 1 second clips of NPR radio recordings. We consider the 'z_openset' a 'negative' class since it does not contain any of the keywords that are of interest such as 'go' and 'stop' (which are considered 'positive' classes).
This is entirely optional, but to import this data, go to Data acquisition, click the Upload icon, and select a number of 'noise' or 'unknown' samples (there's 25 minutes of each class, but you can select less files if you want), and clicking Begin upload. The data is automatically labeled and added to your project. Make sure you label them 'z_openset' before uploading so that they go into the pre-existing 'z_openset' category.
If you've collected all your training data through the 'Record new data' widget you'll have all your keywords in the 'Training' dataset. This is not great, because you want to keep 20% of your data separate to validate the machine learning model. To mitigate this you can go to Dashboard and select Rebalance dataset. This will automatically split your data between a training class (80%) and a testing class (20%). Afterwards you should see something like this:
Next steps
In the next steps we walk you through in detail how the Syntiant signal processing and neural network blocks were configured. If you imported the project, these are already pre-configured for you so you can just read on to understand more details.
With the data set in place you can design an impulse. An impulse takes the raw data, slices it up in smaller windows, uses signal processing blocks to extract features, and then uses a learning block to classify new data. Signal processing blocks always return the same values for the same input and are used to make raw data easier to process, while learning blocks learn from past experiences.
For this tutorial we'll use the "Syntiant" signal processing block. The Syntiant processing block is similar to the Audio MFE block, but using a log-MEL scale plus other transformations specific to the Syntiant audio front-end.
We'll then pass this simplified audio data into a Neural Network block, which will learn to distinguish between the three classes of audio.
In the Studio, go to the Create impulse tab, add a Time series data, an Audio (Syntiant) and a Neural Network (Keras) block. Set the window size to 968 ms and the window increase to 484 ms (we'll explain why later), verify that sampling frequency is set to 16000 Hz and click Save Impulse.
Now that we've assembled the building blocks of our Impulse, we can configure each individual part. Click on the Syntiant tab in the left hand navigation menu. You'll see a page that looks like this:
This page allows you to configure some parameters of the Syntiant block, and lets you preview how the data will be transformed. The right of the page shows a visualization of the Syntiant's output for a piece of audio. The Syntiant processing block extracts speech features using filterbanks. To get a better understanding of parameters, have a look at the Audio MFE documentation.
Features generated by the Syntiant DSP block are slightly different between the TinyML and Nicla Voice. Set the features extractor
to:
gpu/NDP101 for Syntiant TinyML
log-bin/NDP120 for Nicla Voice
Syntiant block parameters
The number of generated features has to be 1,600, which corresponds to the Syntiant Neural Network input layer. To generate 1,600 features, you have to verify the following equation: window size = 1000 x (39 x frame stride + frame length). For our example: window size = 968 ms = 1000 x (39 x 0.024 + 0.032).
In the spectrogram the vertical axis represents the frequencies (the number of frequency bands is controlled by 'Number of coefficients' parameter), and the horizontal axis represents time (controlled by 'frame stride' and 'frame length'). The patterns visible in a spectrogram contain information about what type of sound it represents. For example, the spectrogram in this image shows "Go":
These differences are not necessarily easy for a person to describe, but fortunately they are enough for a neural network to learn to identify.
It's interesting to explore your data and look at the types of spectrograms it results in. You can use the dropdown box near the top right of the page to choose between different audio samples to visualize, or play with the parameters to see how the spectrogram changes.
The spectrograms generated by the Syntiant block will be passed into a neural network architecture that is particularly good at learning to recognize patterns in this type of tabular data. Before training our neural network, we'll need to generate Syntiant blocks for all of our windows of audio. To do this, click the Generate features button at the top of the page, then click the green Generate features button. This will take a minute or so to complete.
Afterwards you're presented with one of the most useful features in Edge Impulse: the feature explorer. This is a 3D representation showing your complete dataset, with each data-item color-coded to its respective label. You can zoom in to every item, find anomalies (an item that's in a wrong cluster), and click on items to listen to the sample. This is a great way to check whether your dataset contains wrong items, and to validate whether your dataset is suitable for ML (it should separate nicely).
With all data processed it's time to start training a neural network. Neural networks are algorithms, modeled loosely after the human brain, that can learn to recognize patterns that appear in their training data. The network that we're training here will take the processing block features as an input, and try to map this to one of the three classes — 'go', 'stop', or 'z_openset'.
Click on NN Classifier in the left hand menu. You'll see the following page:
Neural Network architectures
Syntiant TinyML only supports a Dense architecture. The Arduino Nicla Voice also supports Convolutional models and it is selected by default.
With everything in place, click Start training. You'll see a lot of text flying past in the Training output panel, which you can ignore for now. Training will take a few minutes. When it's complete, you'll see the Last training performance panel appear at the bottom of the page:
Congratulations, you've trained a neural network with Edge Impulse and ready to deploy on your hardware! But what do all these numbers mean?
At the start of training, 20% of the training data is set aside for validation. This means that instead of being used to train the model, it is used to evaluate how the model is performing. The Last training performance panel displays the results of this validation, providing some vital information about your model and how well it is working. Bear in mind that your exact numbers may differ from the ones in this tutorial.
On the left hand side of the panel, Accuracy refers to the percentage of windows of audio that were correctly classified. The higher number the better, although an accuracy approaching 100% is unlikely, and is often a sign that your model has overfit the training data. You will find out whether this is true in the next stage, during model testing. For many applications, an accuracy above 85% can be considered very good.
The Confusion matrix is a table showing the balance of correctly versus incorrectly classified windows. To understand it, compare the values in each row. For example, in the above screenshot, 98.8% of the 'go' audio windows were classified as 'go', while 1.8% of them were classified to be in the 'stop'. This appears to be a great result.
The performance numbers in the previous step show that our model is working well on its training data, but it's extremely important that we test the model on new, unseen data before deploying it in the real world. This will help us ensure the model has not learned to overfit the training data, which is a common occurrence.
Fortunately we've put aside 20% of our data already in the 'Test set' (see Data acquisition). This is data that the model has never seen before, and we can use this to validate whether our model actually works on unseen data. To run your model against the test set, head to Model testing, and click Classify all.
To drill down into a misclassified sample, click the three dots (⋮
) next to a sample and select Show classification. You're then transported to the classification view, which lets you inspect the sample, and compare the sample to your training data. This way you can inspect whether this was actually a classification failure, or whether your data was incorrectly labeled. From here you can either update the label (when the label was wrong), or move the item to the training set to refine your model.
Misclassifications and uncertain results
It's inevitable that even a well-trained machine learning model will sometimes misclassify its inputs. When you integrate a model into your application, you should take into account that it will not always give you the correct answer.
For example, if you are classifying audio, you might want to classify several windows of data and average the results. This will give you better overall accuracy than assuming that every individual result is correct.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption.
To export your model, click on Deployment in the menu. Then under 'Build firmware' select the Syntiant TinyML or Arduino Nicla Voice.
The final step before building the firmware is to configure the posterior handler parameters of the Syntiant chip.
Pre-configured posterior parameters
For the Syntiant-RC project, we've already pre-configured the posterior parameters so you can just go to the 'Build' output step. We recommend skipping to Step 9, but read on for more details about the process of posterior search.
Those parameters are used to tune the precision and recall of the neural network activations, to minimize False Rejection Rate and False Activation Rate. You can manually edit those parameters in JSON format or use the Find posterior parameters to search for the best values:
Select the classes you want to detect (the z_openset class should be omitted except for testing purpose)
Select a calibration dataset: either no calibration (recommended for TinyML/NDP101), the reference dataset with common english words from a radio program, or your own calibration dataset.
This calibration dataset is made of 2 files: an audio wav file and a csv file which contains the audio transcript of this program. This audio transcript should ideally have the classes you want to detect. For example you can imagine the following csv file: go,stop,this,is,an,example,transcript,for,optimizing,the,posterior,parameters,,,it,will,optimize,activations,for,the,go,stop,keywords,
You can also simplify the csv file and include only the keywords/classes you are interested in optimizing. For instance, if your audio wav files contains only 2 occurrences of 'go' and 'stop': go,stop,go,stop,
Tuning the audio gain for the Syntiant TinyML
After generating the posterior parameters, you can change the Syntiant TinyML audio gain by editing the "audio_pdm_gain" value at the end of the JSON file. Default value is set to 12 dB.
Once optimized parameters have been found, you can click Build. This will build a Syntiant package that will run on your development board. After building is completed you'll get prompted to download a zipfile. Save this on your computer. A pop-up video will show how the download process works.
After unzipping the downloaded file, run the appropriate flashing script for your operating system, to flash the Syntiant TinyML Board with the Syntiant-RC model and associated firmware. You might see a Microsoft Defender screen pop up when the script is run on Windows 10. It's safe to proceed so select 'More info' and continue.
Run the edge-impulse-run-impulse
CLI command in your terminal:
You can also connect to the board's newly flashed firmware over serial as an alternative. Open a terminal and connect using the appropriate COM port with 115200 8-N-1 settings.
Congratulations! Now that you've trained and deployed your model you can go further and build your own custom firmware, see Running your impulse locally on Syntiant TinyML Board.
Or if you're interested in more, see our tutorial on Motion Recognition - Syntiant TinyML.
We can't wait to see what you'll build! 🚀
In this tutorial, you'll use machine learning to build a system that can recognize audible events, particularly your voice through audio classification. The system you create will work similarly to "Hey Siri" or "OK, Google" and is able to recognize keywords or other audible events, even in the presence of other background noise or background chatter.
You'll learn how to collect audio data from microphones, use signal processing to extract the most important information, and train a deep neural network that can tell you whether your keyword was heard in a given clip of audio. Finally, you'll deploy the system to an embedded device and evaluate how well it works.
At the end of this tutorial, you'll have a firm understanding of how to classify audio using Edge Impulse.
You can view the finished project, including all data, signal processing and machine learning blocks here: .
Detect non-voice audio?
We have a tutorial for that too! See .
For this tutorial you'll need a supported device. Follow the steps to connect your development board to Edge Impulse.
- the easiest option if you don't have one of the above.
If your device is connected under Devices in the studio you can proceed:
Device compatibility
In this tutorial we want to build a system that recognizes keywords, so your first job is to think of a great one. It can be your name, an action, or even a growl - it's your party. Do keep in mind that some keywords are harder to distinguish from others, and especially keywords with only one syllable (like 'One') might lead to false-positives (e.g. when you say 'Gone'). This is the reason that Apple, Google and Amazon all use at least three-syllable keywords ('Hey Siri', 'OK, Google', 'Alexa'). A good one would be "Hello world".
To collect your first data, go to Data acquisition, set your keyword as the label, set your sample length to 10s., your sensor to 'microphone' and your frequency to 16KHz. Then click Start sampling and start saying your keyword over and over again (with some pause in between).
Afterwards you have a file like this, clearly showing your keywords, separated by some noise.
![10 seconds of "Hello world" data](/.gitbook/assets/68a7c9e-screenshot_2020-11-19_at_221857.png", "Screenshot 2020-11-19 at 22.18.57.png)
This data is not suitable for Machine Learning yet though. You will need to cut out the parts where you say your keyword. This is important because you only want the actual keyword to be labeled as such, and not accidentally label noise, or incomplete sentences (e.g. only "Hello"). Fortunately the Edge Impulse Studio can do this for you. Click ⋮
next to your sample, and select Split sample.
If you have a short keyword, enable Shift samples to randomly shift the sample around in the window, and then click Split. You now have individual 1s. long samples in your dataset. Perfect!
Now that you know how to collect data we can consider other data we need to collect. In addition to your keyword we'll also need audio that is not your keyword. Like background noise, the TV playing ('noise' class), and humans saying other words ('unknown' class). This is required because a machine learning model has no idea about right and wrong (unless those are your keywords), but only learns from the data you feed into it. The more varied your data is, the better your model will work.
For each of these three classes ('your keyword', 'noise', and 'unknown') you want to capture an even amount of data (balanced datasets work better) - and for a decent keyword spotting model you'll want at least 10 minutes in each class (but, the more the better).
Thus, collect 10 minutes of samples for your keyword - do this in the same manner as above. The fastest way is probably through your mobile phone, collecting 1 minute clips, then automatically splitting this data. Make sure to capture wide variations of the keyword: leverage your family and your colleagues to help you collect the data, make sure you cover high and low pitches, and slow and fast speakers.
To import this data, go to Data acquisition, click the Upload icon, and select a number of 'noise' or 'unknown' samples (there's 25 minutes of each class, but you can select less files if you want), and clicking Begin upload. The data is automatically labeled and added to your project.
If you've collected all your training data through the 'Record new data' widget you'll have all your keywords in the 'Training' dataset. This is not great, because you want to keep 20% of your data separate to validate the machine learning model. To mitigate this you can go to Dashboard and select Perform train/test split. This will automatically split your data between a training class (80%) and a testing class (20%). Afterwards you should see something like this:
With the data set in place you can design an impulse. An impulse takes the raw data, slices it up in smaller windows, uses signal processing blocks to extract features, and then uses a learning block to classify new data. Signal processing blocks always return the same values for the same input and are used to make raw data easier to process, while learning blocks learn from past experiences.
For this tutorial we'll use the "MFCC" signal processing block. MFCC stands for Mel Frequency Cepstral Coefficients. This sounds scary, but it's basically just a way of turning raw audio—which contains a large amount of redundant information—into simplified form. Edge Impulse has many other processing blocks for audio, including "MFE" and the "Spectrogram" blocks for non-voice audio, but the "MFCC" block is great for dealing with human speech.
We'll then pass this simplified audio data into a Neural Network block, which will learn to distinguish between the three classes of audio.
In the Studio, go to the Create impulse tab, add a Time series data, an Audio (MFCC) and a Classification (Keras) block. Leave the window size to 1 second (as that's the length of our audio samples in the dataset) and click Save Impulse.
Now that we've assembled the building blocks of our Impulse, we can configure each individual part. Click on the MFCC tab in the left hand navigation menu. You'll see a page that looks like this:
In the spectrogram the vertical axis represents the frequencies (the number of frequency bands is controlled by 'Number of coefficients' parameter, try it out!), and the horizontal axis represents time (controlled by 'frame stride' and 'frame length'). The patterns visible in a spectrogram contain information about what type of sound it represents. For example, the spectrogram in this image shows "Hello world":
And the spectrogram in this image shows "On":
These differences are not necessarily easy for a person to describe, but fortunately they are enough for a neural network to learn to identify.
It's interesting to explore your data and look at the types of spectrograms it results in. You can use the dropdown box near the top right of the page to choose between different audio samples to visualize, or play with the parameters to see how the spectrogram changes.
In addition, you can see the performance of the MFCC block on your microcontroller below the spectrogram. This is the complete time that it takes on a low-power microcontroller (Cortex-M4F @ 80MHz) to analyze 1 second of data.
The spectrograms generated by the MFCC block will be passed into a neural network architecture that is particularly good at learning to recognize patterns in this type of tabular data. Before training our neural network, we'll need to generate MFCC blocks for all of our windows of audio. To do this, click the Generate features button at the top of the page, then click the green Generate features button. This will take a minute or so to complete.
Afterwards you're presented with one of the most useful features in Edge Impulse: the feature explorer. This is a 3D representation showing your complete dataset, with each data-item color-coded to its respective label. You can zoom in to every item, find anomalies (an item that's in a wrong cluster), and click on items to listen to the sample. This is a great way to check whether your dataset contains wrong items, and to validate whether your dataset is suitable for ML (it should separate nicely).
With all data processed it's time to start training a neural network. Neural networks are algorithms, modeled loosely after the human brain, that can learn to recognize patterns that appear in their training data. The network that we're training here will take the MFCC as an input, and try to map this to one of three classes—your keyword, noise or unknown.
Click on NN Classifier in the left hand menu. You'll see the following page:
A neural network is composed of layers of virtual "neurons", which you can see represented on the left hand side of the NN Classifier page. An input—in our case, an MFCC spectrogram—is fed into the first layer of neurons, which filters and transforms it based on each neuron's unique internal state. The first layer's output is then fed into the second layer, and so on, gradually transforming the original input into something radically different. In this case, the spectrogram input is transformed over four intermediate layers into just two numbers: the probability that the input represents your keyword, and the probability that the input represents 'noise' or 'unknown'.
During training, the internal state of the neurons is gradually tweaked and refined so that the network transforms its input in just the right ways to produce the correct output. This is done by feeding in a sample of training data, checking how far the network's output is from the correct answer, and adjusting the neurons' internal state to make it more likely that a correct answer is produced next time. When done thousands of times, this results in a trained network.
A particular arrangement of layers is referred to as an architecture, and different architectures are useful for different tasks. The default neural network architecture provided by Edge Impulse will work well for our current project, but you can also define your own architectures. You can even import custom neural network code from tools used by data scientists, such as TensorFlow and Keras (click the three dots at the top of the page).
Before you begin training, you should change some values in the configuration. Change the Minimum confidence rating to 0.6. This means that when the neural network makes a prediction (for example, that there is 0.8 probability that some audio contains "hello world") Edge Impulse will disregard it unless it is above the threshold of 0.6.
Next, enable 'Data augmentation'. When enabled your data is randomly mutated during training. For example, by adding noise, masking time or frequency bands, or warping your time axis. This is a very quick way to make your dataset work better in real life (with unpredictable sounds coming in), and prevents your neural network from overfitting (as the data samples are changed every training cycle).
With everything in place, click Start training. You'll see a lot of text flying past in the Training output panel, which you can ignore for now. Training will take a few minutes. When it's complete, you'll see the Last training performance panel appear at the bottom of the page:
Congratulations, you've trained a neural network with Edge Impulse! But what do all these numbers mean?
At the start of training, 20% of the training data is set aside for validation. This means that instead of being used to train the model, it is used to evaluate how the model is performing. The Last training performance panel displays the results of this validation, providing some vital information about your model and how well it is working. Bear in mind that your exact numbers may differ from the ones in this tutorial.
On the left hand side of the panel, Accuracy refers to the percentage of windows of audio that were correctly classified. The higher number the better, although an accuracy approaching 100% is unlikely, and is often a sign that your model has overfit the training data. You will find out whether this is true in the next stage, during model testing. For many applications, an accuracy above 85% can be considered very good.
The Confusion matrix is a table showing the balance of correctly versus incorrectly classified windows. To understand it, compare the values in each row. For example, in the above screenshot, 96 of the helloworld audio windows were classified as helloworld, while 10 of them were incorrectly classified as unknown or noise. This appears to be a great result.
The On-device performance region shows statistics about how the model is likely to run on-device. Inferencing time is an estimate of how long the model will take to analyze one second of data on a typical microcontroller (an Arm Cortex-M4F running at 80MHz). Peak RAM usage gives an idea of how much RAM will be required to run the model on-device.
The performance numbers in the previous step show that our model is working well on its training data, but it's extremely important that we test the model on new, unseen data before deploying it in the real world. This will help us ensure the model has not learned to overfit the training data, which is a common occurrence.
Fortunately we've put aside 20% of our data already in the 'Test set' (see Data acquisition). This is data that the model has never seen before, and we can use this to validate whether our model actually works on unseen data. To run your model against the test set, head to Model testing, select all items and click Classify selected.
To drill down into a misclassified sample, click the three dots (⋮
) next to a sample and select Show classification. You're then transported to the classification view, which lets you inspect the sample, and compare the sample to your training data. This way you can inspect whether this was actually a classification failure, or whether your data was incorrectly labeled. From here you can either update the label (when the label was wrong), or move the item to the training set to refine your model.
Misclassifications and uncertain results
It's inevitable that even a well-trained machine learning model will sometimes misclassify its inputs. When you integrate a model into your application, you should take into account that it will not always give you the correct answer.
For example, if you are classifying audio, you might want to classify several windows of data and average the results. This will give you better overall accuracy than assuming that every individual result is correct.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption. Edge Impulse can package up the complete impulse - including the MFCC algorithm, neural network weights, and classification code - in a single C++ library that you can include in your embedded software.
Mobile phone
To export your model, click on Deployment in the menu. Then under 'Build firmware' select your development board, and click Build. This will export the impulse, and build a binary that will run on your development board in a single step. After building is completed you'll get prompted to download a binary. Save this on your computer.
When you click the Build button, you'll see a pop-up with text and video instructions on how to deploy the binary to your particular device. Follow these instructions. Once you are done, we are ready to test your impulse out.
We can connect to the board's newly flashed firmware over serial. Open a terminal and run:
Serial daemon
If the device is not connected over WiFi, but instead connected via the Edge Impulse serial daemon, you'll need stop the daemon. Only one application can connect to the development board at a time.
This will capture audio from the microphone, run the MFCC code, and then classify the spectrogram:
Great work! You've captured data, trained a model, and deployed it to an embedded device. You can now control LEDs, activate actuators, or send a message to the cloud whenever you say a keyword!
Is your model working properly in the Studio, but does not recognize your keyword when running in continuous mode on your device? Then this is probably due to dataset imbalance (a lot more unknown / noise data compared to your keyword) in combination with our moving average code to reduce false positives.
When running in continuous mode we run a moving average over the predictions to prevent false positives. E.g. if we do 3 classifications per second you’ll see your keyword potentially classified three times (once at the start of the audio file, once in the middle, once at the end). However, if your dataset is unbalanced (there’s a lot more noise / unknown than in your dataset) the ML model typically manages to only find your keyword in the 'center' window, and thus we filter it out as a false positive.
You can fix this by either:
Adding more data :-)
Or, by disabling the moving average filter by going into ei_run_classifier.h (in the edge-impulse-sdk directory) and removing:
Note that this might increase the number of false positives the model detects.
We can't wait to see what you'll build! 🚀
Responding to your voice
This tutorial is for the Syntiant hardware only (, ), and the . For other development boards, you can follow the standard tutorial
In this tutorial, you'll use machine learning to build a gesture recognition system that runs on the Syntiant TinyML board. This is a hard task to solve using rule-based programming, as people don't perform gestures in the exact same way every time. But machine learning can handle these variations with ease. You'll learn how to collect high-frequency data from an IMU, build a neural network classifier, and how to deploy your model back to a device. At the end of this tutorial, you'll have a firm understanding of applying machine learning on Syntiant TinyML board using Edge Impulse.
Before starting the tutorial
After signing up for a free Edge Impulse account, clone the finished project, including all training data, signal processing and machine learning blocks here: . At the end of the process you will have the full project that comes pre-loaded with training and test datasets.
For this tutorial you'll need the:
An SD Card to perform IMU data acquisition
or
Follow the steps to connect your development board to Edge Impulse.
If your device is connected under Devices in the studio you can proceed:
Device compatibility
With your device connected, we can collect some data. In the studio go to the Data acquisition tab. This is the place where all your raw data is stored, and - if your device is connected to the remote management API - where you can start sampling new data.
Under Record new data, select your Syntiant device, set the label to circular
, the sample length to 2000
, the sensor to Inertial
and the frequency to 100 Hz
. This indicates that you want to record data for 2 seconds, and label the recorded data as circular
. You can later edit these labels if needed.
After you click Start sampling move your device in a circular motion. In about twelve seconds the device should complete sampling and upload the file back to Edge Impulse. You see a new line appear under 'Collected data' in the studio. When you click it you now see the raw data graphed out. As the accelerometer on the development board has three axes you'll notice three different lines, one for each axis.
Continuous movement
It's important to do continuous movements as we'll later slice up the data in smaller windows. Make sure also to perform variations on the motions. E.g. do both slow and fast movements and vary the orientation of the board. You'll never know how your user will use the device.
Machine learning works best with lots of data, so a single sample won't cut it. Now is the time to start building your own dataset. For example, use the following two classes, and record around 3 minutes of data per class:
Circular - circular movements
Z_Openset - random movements that are not circular
Negative Class
The Syntiant NDP chips require a negative class on which no predictions will occur, in our example this is the Z_Openset class. Make sure the class name is last in alphabetical order.
With the training set in place you can design an impulse. An impulse takes the raw data, slices it up in smaller windows, uses signal processing blocks to extract features, and then uses a learning block to classify new data. Signal processing blocks always return the same values for the same input and are used to make raw data easier to process, while learning blocks learn from past experiences.
For this tutorial we'll use the 'IMU Syntiant' signal processing block. This block rescales raw data to 8 bits values to match the NDP chip input requirements. Then we'll use a 'Neural Network' learning block, that takes these generated features and learns to distinguish between our different classes (circular or not).
In the studio go to Create impulse, set the window size to 1800
(you can click on the 1800 ms.
text to enter an exact value), the window increase to 80
, and add the 'IMU Syntiant' and 'Classification (Keras)' blocks. Then click Save impulse.
Window size
The Syntiant NDP101 chip requires the number of generated features to be divisible by 4. In our example, we have 6 axis sampled at 100 Hz with a window of 1800ms, leading to 1080 (180x6) features which is divisible by 4.
To configure your signal processing block, click Syntiant IMU in the menu on the left. This will show you the raw data on top of the screen (you can select other files via the drop down menu), and the processed features on the right.
The Scale 16 bits to 8 bits
converts your raw data to 8 bits and normalize it to the range [-1, 1]. The circular motion public project's dataset is already rescaled so you need to disable the option in this case.
Click Save parameters. This will send you to the 'Feature generation' screen.
Click Generate features to start the process.
Afterwards the 'Feature explorer' will load. This is a plot of all the extracted features against all the generated windows. You can use this graph to compare your complete data set. A good rule of thumb is that if you can visually separate the data on a number of axes, then the machine learning model will be able to do so as well.
With all data processed it's time to start training a neural network. Neural networks are algorithms, modeled loosely after the human brain, that can learn to recognize patterns that appear in their training data. The network that we're training here will take the processing block features as an input, and try to map this to one of the two classes — 'circular' or 'z_openset'.
Click on NN Classifier in the left hand menu. You'll see the following page:
With everything in place, click Start training. When it's complete, you'll see the Last training performance panel appear at the bottom of the page:
Congratulations, you've trained a neural network with Edge Impulse and ready to deploy on Syntiant hardware! But what do all these numbers mean?
At the start of training, 20% of the training data is set aside for validation. This means that instead of being used to train the model, it is used to evaluate how the model is performing. The Last training performance panel displays the results of this validation, providing some vital information about your model and how well it is working. Bear in mind that your exact numbers may differ from the ones in this tutorial.
On the left hand side of the panel, Accuracy refers to the percentage of windows of audio that were correctly classified. The higher number the better, although an accuracy approaching 100% is unlikely, and is often a sign that your model has overfit the training data. You will find out whether this is true in the next stage, during model testing. For many applications, an accuracy above 85% can be considered very good.
The Confusion matrix is a table showing the balance of correctly versus incorrectly classified windows. To understand it, compare the values in each row. For example, in the above screenshot, 100% of the circular motion samples were classified correctly, and 99.6% for the openset samples.
From the statistics in the previous step we know that the model works against our training data, but how well would the network perform on new data? Click on Live classification in the menu to find out. Your device should (just like in step 2) show as online under 'Classify new data'. Set the 'Sample length' to 2000
(5 seconds), click Start sampling and start doing movements. Afterward, you'll get a full report on what the network thought that you did.
If the network performed great, fantastic! But what if it performed poorly? There could be a variety of reasons, but the most common ones are:
There is not enough data. Neural networks need to learn patterns in data sets, and the more data the better.
The data does not look like other data the network has seen before. This is common when someone uses the device in a way that you didn't add to the test set. You can add the current file to the test set by clicking ⋮
, then selecting Move to training set. Make sure to update the label under 'Data acquisition' before training.
The model has not been trained enough. Up the number of epochs to 50
and see if performance increases (the classified file is stored, and you can load it through 'Classify existing validation sample').
As you see there is still a lot of trial and error when building neural networks, but we hope the visualizations help a lot. You can also run the network against the complete validation set through 'Model validation'. Think of the model validation page as a set of unit tests for your model!
With a working model in place, we can look at places where our current impulse performs poorly.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption.
To export your model, click on Deployment in the menu. Then under 'Build firmware' select the Syntiant development board,
The final step before building the firmware is to configure the posterior handler parameters of the Syntiant chip.
Pre-configured posterior parameters
For the Syntiant Circular Motion project, we've already pre-configured the posterior parameters so you can just go to the 'Build' output step.
Those parameters are used to tune the precision and recall of the neural network activations, to minimize False Rejection Rate and False Activation Rate. You can manually edit those parameters in JSON format or use the Find posterior parameters to search for the best values:
Select the classes you want to detect and make sure to uncheck the last class (Z_Openset in our example)
Select a calibration method: either no calibration (fastest recommended for Syntiant TinyML board), or FAR optimized (FAR is optimized for an FRR target < 0.2).
Once optimized parameters have been found, you can click Build. This will build a Syntiant package that will run on your development board. After building is completed you'll get prompted to download a zipfile. Save this on your computer. A pop-up video will show how the download process works.
After unzipping the downloaded file, run the appropriate flashing script for your platform (Linux, Mac, or Win 10) to flash the board with the Syntiant Circular Motion model and associated firmware. You might see a Microsoft Defender screen pop up when the script is run on Windows 10. It's safe to proceed so select 'More info' and continue.
We can connect to the board's newly flashed firmware over serial. Open a terminal and run:
Serial daemon
If the device is connected via the Edge Impulse serial daemon, you'll need to stop the daemon first. Only one application can connect to the development board at a time.
This will sample data from the sensor, run the signal processing code, and then classify the data:
Victory! You've now built your first on-device machine learning model.
We can't wait to see what you'll build! 🚀
Responding to your voice
This tutorial is for the Avnet RASynBoard hardware only . For other development boards, you can follow the standard tutorial
In this tutorial, you'll use machine learning to build a gesture recognition system that runs on the RASynBoard. This is a hard task to solve using rule-based programming, as people don't perform gestures in the exact same way every time. But machine learning can handle these variations with ease. You'll learn how to collect high-frequency data from an IMU, build a neural network classifier, and how to deploy your model back to a device. At the end of this tutorial, you'll have a firm understanding of applying machine learning on RASynBoard using Edge Impulse.
Before starting the tutorial
After signing up for a free Edge Impulse account, clone the finished project, including all training data, signal processing and machine learning blocks here: . At the end of the process you will have the full project that comes pre-loaded with training and test datasets.
For this tutorial you'll need the:
An SD Card to perform IMU data acquisition
Follow the steps to connect your development board to Edge Impulse.
If your device is connected under Devices in the studio you can proceed:
Device compatibility
With your device connected, we can collect some data. In the studio go to the Data acquisition tab. This is the place where all your raw data is stored, and - if your device is connected to the remote management API - where you can start sampling new data.
Under Record new data, select your device, set the label to updown
, the sample length to 5000
, the sensor to Inertial
and the frequency to 200Hz
. This indicates that you want to record data for 2 seconds, and label the recorded data as updown
. You can later edit these labels if needed. You may increase the sample length to 10 seconds or more to collect more data in one session.
After you click Start sampling move your device in an Up/Down motion. Within a few seconds the device should complete sampling and upload the file back to Edge Impulse. You see a new line appear under 'Collected data' in the studio. When you click it you now see the raw data graphed out.
Continuous movement
It's important to do continuous movements as we'll later slice up the data in smaller windows. Make sure also to perform variations on the motions. E.g. do both slow and fast movements and vary the orientation of the board. You'll never know how your user will use the device.
Machine learning works best with lots of data, so a single sample won't cut it. Now is the time to start building your own dataset. For example, use the following classes, and record around 3 minutes of data per class:
idle - just sitting on your desk while you're working.
snake - moving the device over your desk as a snake.
wave - waving the device from left to right.
updown - moving the device up and down.
Z_Openset - random movements that are not circular
Negative Class
The Syntiant NDP chips require a negative class on which no predictions will occur, in our example this is the Z_Openset class. Make sure the negative lass name is last in alphabetical order.
With the training set in place you can design an impulse. An impulse takes the raw data, slices it up in smaller windows, uses signal processing blocks to extract features, and then uses a learning block to classify new data. Signal processing blocks always return the same values for the same input and are used to make raw data easier to process, while learning blocks learn from past experiences.
For this tutorial we'll use the 'IMU Syntiant' signal processing block. This block rescales raw data to 8 bits values to match the NDP chip input requirements. Then we'll use a 'Neural Network' learning block, that takes these generated features and learns to distinguish between our different classes (circular or not).
In the studio go to Create impulse, set the window size to 2000
(you can click on the 2000 ms.
text to enter an exact value), the window increase to 240
, the frequency to '200' and add the 'IMU Syntiant' and 'Classification (Keras)' blocks. Then click Save impulse.
To configure your signal processing block, click Syntiant IMU in the menu on the left. This will show you the raw data on top of the screen (you can select other files via the drop down menu), and the processed features on the right.
The Scale 16 bits to 8 bits (Raw)
converts your raw data to 8 bits and normalize it to the range [-1, 1].
Click Save parameters. This will send you to the 'Feature generation' screen.
Click Generate features to start the process.
Afterwards the 'Feature explorer' will load. This is a plot of all the extracted features against all the generated windows. You can use this graph to compare your complete data set. A good rule of thumb is that if you can visually separate the data on a number of axes, then the machine learning model will be able to do so as well.
With all data processed it's time to start training a neural network. Neural networks are algorithms, modeled loosely after the human brain, that can learn to recognize patterns that appear in their training data. The network that we're training here will take the processing block features as an input, and try to map this to the classes — 'updown', 'wave', 'snake', 'idle', or 'z_openset'.
Click on NN Classifier in the left hand menu. You'll see the following page:
With everything in place, click Start training. When it's complete, you'll see the Last training performance panel appear at the bottom of the page:
Congratulations, you've trained a neural network with Edge Impulse and ready to deploy on Syntiant hardware! But what do all these numbers mean?
At the start of training, 20% of the training data is set aside for validation. This means that instead of being used to train the model, it is used to evaluate how the model is performing. The Last training performance panel displays the results of this validation, providing some vital information about your model and how well it is working. Bear in mind that your exact numbers may differ from the ones in this tutorial.
On the left hand side of the panel, Accuracy refers to the percentage of windows of audio that were correctly classified. The higher number the better, although an accuracy approaching 100% is unlikely, and is often a sign that your model has overfit the training data. You will find out whether this is true in the next stage, during model testing. For many applications, an accuracy above 85% can be considered very good.
The Confusion matrix is a table showing the balance of correctly versus incorrectly classified windows. To understand it, compare the values in each row. For example, in the above screenshot, 96.6% of the snake motion samples were classified correctly.
From the statistics in the previous step we know that the model works against our training data, but how well would the network perform on new data? Click on Live classification in the menu to find out. Your device should (just like in step 2) show as online under 'Classify new data'. Set the 'Sample length' to `2000, click Start sampling and start doing movements. Afterward, you'll get a full report on what the network thought that you did.
If the network performed great, fantastic! But what if it performed poorly? There could be a variety of reasons, but the most common ones are:
There is not enough data. Neural networks need to learn patterns in data sets, and the more data the better.
The data does not look like other data the network has seen before. This is common when someone uses the device in a way that you didn't add to the test set. You can add the current file to the test set by clicking ⋮
, then selecting Move to training set. Make sure to update the label under 'Data acquisition' before training.
The model has not been trained enough. Up the number of epochs to 50
or '100' and see if performance increases (the classified file is stored, and you can load it through 'Classify existing validation sample').
As you see there is still a lot of trial and error when building neural networks, but we hope the visualizations help a lot. You can also run the network against the complete validation set through 'Model validation'. Think of the model validation page as a set of unit tests for your model!
With a working model in place, we can look at places where our current impulse performs poorly.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption.
To export your model, click on Deployment in the menu. Then under 'Build firmware' select the RASynBoard development board,
The final step before building the firmware is to configure the posterior handler parameters of the Syntiant chip.
Pre-configured posterior parameters
For the public (clonable) project, we've already pre-configured the posterior parameters so you can just go to the 'Build' output step.
Those parameters are used to tune the precision and recall of the neural network activations, to minimize False Rejection Rate and False Activation Rate. You can manually edit those parameters in JSON format or use the Find posterior parameters to search for the best values:
Select the classes you want to detect and make sure to uncheck the last class (Z_Openset in our example)
Select a calibration method: no calibration
Once optimized parameters have been found, you can click Build. This will build a package that will run on your development board. After building is completed you'll get prompted to download a zipfile. Save this on your computer. A pop-up video will show how the download process works.
After unzipping the downloaded file, run the appropriate flashing script for your platform (Linux, Mac, or Win 10+) to flash the board with the model and associated firmware.
We can connect to the board's newly flashed firmware over serial. Open a terminal and run:
Serial daemon
If the device is connected via the Edge Impulse serial daemon, you'll need to stop the daemon first. Only one application can connect to the development board at a time.
This will sample data from the sensor, run the signal processing code, and then classify the data:
Victory! You've now built your first on-device machine learning model.
We can't wait to see what you'll build! 🚀
In this tutorial, you'll use machine learning to build a system that can recognize when a particular sound is happening—a task known as audio classification. The system you create will be able to recognize the sound of water running from a faucet, even in the presence of other background noise.
You'll learn how to collect audio data from microphones, use signal processing to extract the most important information, and train a deep neural network that can tell you whether the sound of running water can be heard in a given clip of audio. Finally, you'll deploy the system to an embedded device and evaluate how well it works.
At the end of this tutorial, you'll have a firm understanding of how to classify audio using Edge Impulse.
You can view the finished project, including all data, signal processing and machine learning blocks here: .
For this tutorial you'll need a supported device. Follow the steps to connect your development board to Edge Impulse.
- if you don't have a development kit
If your device is connected under Devices in the studio you can proceed:
Device compatibility
To build this project, you'll need to collect some audio data that will be used to train the machine learning model. Since the goal is to detect the sound of a running faucet, you'll need to collect some examples of that. You'll also need some examples of typical background noise that doesn't contain the sound of a faucet, so the model can learn to discriminate between the two. These two types of examples represent the two classes we'll be training our model to detect: background noise, or running faucet.
You can use your device to collect some data. In the studio, go to the Data acquisition tab. This is the place where all your raw data is stored, and - if your device is connected to the remote management API - where you can start sampling new data.
Let's start by recording an example of background noise that doesn't contain the sound of a running faucet. Under Record new data, select your device, set the label to noise
, the sample length to 1000
, and the sensor to Built-in microphone
. This indicates that you want to record 1 second of audio, and label the recorded data as noise
. You can later edit these labels if needed.
After you click Start sampling, the device will capture a second of audio and transmit it to Edge Impulse. The LED will light while recording is in progress, then light again during transmission.
When the data has been uploaded, you will see a new line appear under 'Collected data'. You will also see the waveform of the audio in the 'RAW DATA' box. You can use the controls underneath to listen to the audio that was captured.
Since you now know how to capture audio with Edge Impulse, it's time to start building a dataset. For a simple audio classification model like this one, we should aim to capture around 10 minutes of data. We have two classes, and it's ideal if our data is balanced equally between each of them. This means we should aim to capture the following data:
5 minutes of background noise, with the label "noise"
5 minutes of running faucet noise, with the label "faucet"
In the real world, there are usually additional sounds present alongside the sounds we care about. For example, a running faucet is often accompanied by the sound of dishes being washed, teeth being brushed, or a conversation in the kitchen. Background noise might also include the sounds of television, kids playing, or cars driving past outside.
It's important that your training data contains these types of real world sounds. If your model is not exposed to them during training, it will not learn to take them into account, and it will not perform well during real-world usage.
For this tutorial, you should try to capture the following:
Background noise
2 minutes of background noise without much additional activity
1 minute of background noise with a TV or music playing
1 minute of background noise featuring occasional talking or conversation
1 minutes of background noise with the sounds of housework
Running faucet noise
1 minute of a faucet running
1 minute of a different faucet running
1 minute of a faucet running with a TV or music playing
1 minute of a faucet running with occasional talking or conversation
1 minute of a faucet running with the sounds of housework
It's okay if you can't get all of these, as long as you still obtain 5 minutes of data for each class. However, your model will perform better in the real world if it was trained on a representative dataset.
Dataset diversity
There's no guarantee your model will perform well in the presence of sounds that were not included in its training set, so it's important to make your dataset as diverse and representative of real-world conditions as possible.
The amount of audio that can be captured in one go varies depending on a device's memory. The ST B-L475E-IOT01A developer board has enough memory to capture 60 seconds of audio at a time, and the Arduino Nano 33 BLE Sense has enough memory for 16 seconds. To capture 60 seconds of audio, set the sample length to 60000
. Because the board transmits data quite slowly, it will take around 7 minutes before a 60 second sample appears in Edge Impulse.
Once you've captured around 10 minutes of data, it's time to start designing an Impulse.
With the training set in place you can design an impulse. An impulse takes the raw data, slices it up in smaller windows, uses signal processing blocks to extract features, and then uses a learning block to classify new data. Signal processing blocks always return the same values for the same input and are used to make raw data easier to process, while learning blocks learn from past experiences.
For this tutorial we'll use the "MFE" signal processing block. MFE stands for Mel Frequency Energy. This sounds scary, but it's basically just a way of turning raw audio—which contains a large amount of redundant information—into simplified form.
Spectrogram block
Edge Impulse supports three different blocks for audio classification: MFCC, MFE and spectrogram blocks. If your accuracy is not great using the MFE block you can switch to the spectrogram block, which is not tuned to frequencies for the human ear.
We'll then pass this simplified audio data into a Neural Network block, which will learn to distinguish between the two classes of audio (faucet and noise).
In the studio, go to the Create impulse tab. You'll see a Time Series Data block, like this one.
As mentioned above, Edge Impulse slices up the raw samples into windows that are fed into the machine learning model during training. The Window size field controls how long, in milliseconds, each window of data should be. A half second audio sample will be enough to determine whether a faucet is running or not, so you should make sure Window size is set to 500 ms. You can either drag the slider or type a new value directly.
Each raw sample is sliced into multiple windows, and the Window increase field controls the offset of each subsequent window from the first. For example, a Window increase value of 500 ms would result in each window starting 1 second after the start of the previous one.
By setting a Window increase that is smaller than the Window size, we can create windows that overlap. This is actually a great idea. Although they may contain similar data, each overlapping window is still a unique example of audio that represents the sample's label. By using overlapping windows, we can make the most of our training data. For example, with a Window size of 500 ms and a Window increase of 100 ms, we can extract 20 unique windows from only 2 seconds of data.
Make sure the Window increase field is set to 300 ms. The Time series data block should match the screenshot above.
Next, click Add a processing block and choose the 'MFE' block. Once you're done with that, click Add a learning block and select 'Classification (Keras)'. Finally, click Save impulse. Your impulse should now look like this:
Now that we've assembled the building blocks of our Impulse, we can configure each individual part. Click on the MFE tab in the left hand navigation menu. You'll see a page that looks like this:
The MFE block transforms a window of audio into a table of data where each row represents a range of frequencies and each column represents a span of time. The value contained within each cell reflects the amplitude of its associated range of frequencies during that span of time. The spectrogram shows each cell as a colored block, the intensity which varies depends on the amplitude.
The patterns visible in a spectrogram contain information about what type of sound it represents. For example, the spectrogram in this image shows a pattern typical of background noise:
You can tell that it is slightly different from the following spectrogram, which shows a pattern typical of a running faucet:
These differences are not necessarily easy for a person to describe, but fortunately they are enough for a neural network to learn to identify.
It's interesting to explore your data and look at the types of spectrograms it results in. You can use the dropdown box near the top right of the page to choose between different audio samples to visualize, and drag the white window on the audio waveform to select different windows of data:
There are a lot of different ways to configure the MFE block, as shown in the Parameters box. Here, we are going to lower the Filter number to 32
in order to lower the overhead of running the MFE block. You can play around with other parameters such as the noise floor to see how they impact the spectrogram.
For the first run through this tutorial, set your MFE parameters to match the image below:
The spectrograms generated by the MFE block will be passed into a neural network architecture that is particularly good at learning to recognize patterns in this type of tabular data. Before training our neural network, we'll need to generate MFE blocks for all of our windows of audio. To do this, click the Generate features button at the top of the page, then click the green Generate features button. If you have a full 10 minutes of data, the process will take a while to complete:
Next, we'll configure the neural network and begin training.
With all data processed it's time to start training a neural network. Neural networks are algorithms, modeled loosely after the human brain, that can learn to recognize patterns that appear in their training data. The network that we're training here will take the MFE as an input, and try to map this to one of two classes—noise, or faucet.
Click on NN Classifier in the left hand menu. You'll see the following page:
A neural network is composed of layers of virtual "neurons", which you can see represented on the left hand side of the NN Classifier page. An input—in our case, an MFE spectrogram—is fed into the first layer of neurons, which filters and transforms it based on each neuron's unique internal state. The first layer's output is then fed into the second layer, and so on, gradually transforming the original input into something radically different. In this case, the spectrogram input is transformed over four intermediate layers into just two numbers: the probability that the input represents noise, and the probability that the input represents a running faucet.
During training, the internal state of the neurons is gradually tweaked and refined so that the network transforms its input in just the right ways to produce the correct output. This is done by feeding in a sample of training data, checking how far the network's output is from the correct answer, and adjusting the neurons' internal state to make it more likely that a correct answer is produced next time. When done thousands of times, this results in a trained network.
A particular arrangement of layers is referred to as an architecture, and different architectures are useful for different tasks. The default neural network architecture provided by Edge Impulse will work well for our current project, but you can also define your own architectures. You can even import custom neural network code from tools used by data scientists, such as TensorFlow and Keras.
The default settings should work, and to begin training, click Start training. You'll see a lot of text flying past in the Training output panel, which you can ignore for now. Training will take a few minutes. When it's complete, you'll see the Model panel appear at the right side of the page:
Congratulations, you've trained a neural network with Edge Impulse! But what do all these numbers mean?
At the start of training, 20% of the training data is set aside for validation. This means that instead of being used to train the model, it is used to evaluate how the model is performing. The Last training performance panel displays the results of this validation, providing some vital information about your model and how well it is working. Bear in mind that your exact numbers may differ from the ones in this tutorial.
On the left hand side of the panel, Accuracy refers to the percentage of windows of audio that were correctly classified. The higher number the better, although an accuracy approaching 100% is unlikely, and is often a sign that your model has overfit the training data. You will find out whether this is true in the next stage, during model testing. For many applications, an accuracy above 80% can be considered very good.
The Confusion matrix is a table showing the balance of correctly versus incorrectly classified windows. To understand it, compare the values in each row. For example, in the above screenshot, all of the faucet audio windows were classified as faucet, but a few noise windows were misclassified. This appears to be a great result though.
The On-device performance region shows statistics about how the model is likely to run on-device. Inferencing time is an estimate of how long the model will take to analyze one second of data on a typical microcontroller (here: an Arm Cortex-M4F running at 80MHz). Peak memory usage gives an idea of how much RAM will be required to run the model on-device.
The performance numbers in the previous step show that our model is working well on its training data, but it's extremely important that we test the model on new, unseen data before deploying it in the real world. This will help us ensure the model has not learned to overfit the training data, which is a common occurrence.
Edge Impulse provides some helpful tools for testing our model, including a way to capture live data from your device and immediately attempt to classify it. To try it out, click on Live classification in the left hand menu. Your device should show up in the 'Classify new data' panel. Capture 5 seconds of background noise by clicking Start sampling:
The sample will be captured, uploaded, and classified. Once this has happened, you'll see a breakdown of the results:
Once the sample is uploaded, it is split into windows–in this case, a total of 41. These windows are then classified. As you can see, our model classified all 41 windows of the captured audio as noise. This is a great result! Our model has correctly identified that the audio was background noise, even though this is new data that was not part of its training set.
Of course, it's possible some of the windows may be classified incorrectly. Since our model was 99% accurate based on its validation data, you can expect that at least 1% of windows will be classified wrongly—and likely much more than this, since our validation data doesn't represent every possible type of background or faucet noise. If your model didn't perform perfectly, don't worry. We'll get to troubleshooting later.
Misclassifications and uncertain results
It's inevitable that even a well-trained machine learning model will sometimes misclassify its inputs. When you integrate a model into your application, you should take into account that it will not always give you the correct answer.
For example, if you are classifying audio, you might want to classify several windows of data and average the results. This will give you better overall accuracy than assuming that every individual result is correct.
Using the Live classification tab, you can easily try out your model and get an idea of how it performs. But to be really sure that it is working well, we need to do some more rigorous testing. That's where the Model testing tab comes in. If you open it up, you'll see the sample we just captured listed in the Test data panel:
In addition to its training data, every Edge Impulse project also has a test dataset. Samples captured in Live classification are automatically saved to the test dataset, and the Model testing tab lists all of the test data.
To use the sample we've just captured for testing, we should correctly set its expected outcome. Click the ⋮
icon and select Edit expected outcome, then enter noise
. Now, select the sample using the checkbox to the left of the table and click Classify selected:
You'll see that the model's accuracy has been rated based on the test data. Right now, this doesn't give us much more information that just classifying the same sample in the Live classification tab. But if you build up a big, comprehensive set of test samples, you can use the Model testing tab to measure how your model is performing on real data.
Ideally, you'll want to collect a test set that contains a minimum of 25% the amount of data of your training set. So, if you've collected 10 minutes of training data, you should collect at least 2.5 minutes of test data. You should make sure this test data represents a wide range of possible conditions, so that it evaluates how the model performs with many different types of inputs. For example, collecting test audio for several different faucets is a good idea.
You can use the Data acquisition tab to manage your test data. Open the tab, and then click Test data at the top. Then, use the Record new data panel to capture a few minutes of test data, including audio for both background noise and faucet. Make sure the samples are labelled correctly. Once you're done, head back to the Model testing tab, select all the samples, and click Classify selected:
The screenshot shows classification results from a large number of test samples (there are more on the page than would fit in the screenshot). The panel shows that our model is performing at 85% accuracy, which is 5% less than how it performed on validation data. It's normal for a model to perform less well on entirely fresh data, so this is a successful result. Our model is working well!
For each test sample, the panel shows a breakdown of its individual performance. For example, one of the samples was classified with only 62% accuracy. Samples that contain a lot of misclassifications are valuable, since they have examples of types of audio that our model does not currently fit. It's often worth adding these to your training data, which you can do by clicking the ⋮
icon and selecting Move to training set. If you do this, you should add some new test data to make up for the loss!
Testing your model helps confirm that it works in real life, and it's something you should do after every change. However, if you often make tweaks to your model to try to improve its performance on the test dataset, your model may gradually start to overfit to the test dataset, and it will lose its value as a metric. To avoid this, continually add fresh data to your test dataset.
Data hygiene
It's extremely important that data is never duplicated between your training and test datasets. Your model will naturally perform well on the data that it was trained on, so if there are duplicate samples then your test results will indicate better performance than your model will achieve in the real world.
If the network performed great, fantastic! But what if it performed poorly? There could be a variety of reasons, but the most common ones are:
The data does not look like other data the network has seen before. This is common when someone uses the device in a way that you didn't add to the test set. You can add the current file to the test set by adding the correct label in the 'Expected outcome' field, clicking ⋮
, then selecting Move to training set.
The model has not been trained enough. Increase number of epochs to 200
and see if performance increases (the classified file is stored, and you can load it through 'Classify existing validation sample').
The model is overfitting and thus performs poorly on new data. Try reducing the number of epochs, reducing the learning rate, or adding more data.
The neural network architecture is not a great fit for your data. Play with the number of layers and neurons and see if performance improves.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption. Edge Impulse can package up the complete impulse - including the MFE algorithm, neural network weights, and classification code - in a single C++ library that you can include in your embedded software.
Mobile phone
To export your model, click on Deployment in the menu. Then under 'Build firmware' select your development board, and click Build. This will export the impulse, and build a binary that will run on your development board in a single step. After building is completed you'll get prompted to download a binary. Save this on your computer.
When you click the Build button, you'll see a pop-up with text and video instructions on how to deploy the binary to your particular device. Follow these instructions. Once you are done, we are ready to test your impulse out.
We can connect to the board's newly flashed firmware over serial. Open a terminal and run:
Serial daemon
If the device is not connected over WiFi, but instead connected via the Edge Impulse serial daemon, you'll need stop the daemon. Only one application can connect to the development board at a time.
This will capture audio from the microphone, run the MFE code, and then classify the spectrogram:
Great work! You've captured data, trained a model, and deployed it to an embedded device. It's time to celebrate—by pouring yourself a nice glass of water, and checking whether the sound is correctly classified by you model.
We can't wait to see what you'll build! 🚀
Edge Impulse can ingest data from any device - including embedded devices that you already have in production. See the documentation for the for more information.
Note: Data collection from a development board might be slow, you can use your as a sensor to make this much faster.
For the noise and unknown datasets you can either collect this yourself, or make your life a bit easier by using dataset of both 'noise' (all kinds of background noise) and 'unknown' (random words) data that we built for you here: .
This page allows you to configure the MFCC block, and lets you preview how the data will be transformed. The right of the page shows a visualization of the MFCC's output for a piece of audio, which is known as a . An MFCC spectrogram is a specially tuned spectrogram which highlights frequencies which are common in human speech (Edge Impulse also has normal spectrograms if that's more your thing).
You might think based on this number that we can only classify 2 or 3 windows per second, but we continuously build up the spectrogram (as it has a time component), which takes less time, and we can thus continuously listen for events 5-6x a second, even on an 40MHz processor. This is already implemented on all , and on your own device.
Your mobile phone can build and download the compiled impulse directly from the mobile client. See 'Deploying back to device' on the page.
Congratulations! you've used Edge Impulse to train a neural network model capable of recognizing audible events. There are endless applications for this type of model, from monitoring industrial machinery to recognizing voice commands. Now that you've trained your model you can integrate your impulse in the firmware of your own embedded device, see . There are examples for Mbed OS, Arduino, STM32CubeIDE, Zephyr, and any other target that supports a C++ compiler.
Or if you're interested in more, see our tutorials on or . If you have a great idea for a different project, that's fine too. Edge Impulse lets you capture data from any sensor, build to extract features, and you have full flexibility in your Machine Learning pipeline with the learning blocks.
Edge Impulse can ingest data from any device - including embedded devices that you already have in production. See the documentation for the for more information.
Congratulations! Now that you've trained and deployed your model you can go further and build your own custom firmware, see .
Or if you're interested in audio projects, see our tutorial on .
Edge Impulse can ingest data from any device - including embedded devices that you already have in production. See the documentation for the for more information.
You may also follow
Edge Impulse can ingest data from any device - including embedded devices that you already have in production. See the documentation for the for more information.
Alternatively, you can load an example test set that has about ten minutes of data in these classes (but how much fun is that?). See the for more information.
This page allows you to configure the MFE block, and lets you preview how the data will be transformed. The right of the page shows a visualization of the MFE's output for a piece of audio, which is known as a .
After testing out training with the parameters above, check out the tutorial to learn how to use Edge Impulse to automatically choose the best DSP parameters for your dataset.
Once this process is complete the feature explorer shows a visualization of your dataset. Here dimensionality reduction is used to map your features onto a 3D space, and you can use the feature explorer to see if the different classes separate well, or find mislabeled data (if it shows in a different cluster). You can find more information in .
As you see, there is still a lot of trial and error when building neural networks. One place to start improving the performance of your model is the tutorial. The tuner will automatically test different DSP and NN parameters to improve performance with your dataset.
Your mobile phone can build and download the compiled impulse directly from the mobile client. See 'Deploying back to device' on the page.
Congratulations! You've used Edge Impulse to train a neural network model capable of recognizing a particular sound. There are endless applications for this type of model, from monitoring industrial machinery to recognizing voice commands. Now that you've trained your model you can integrate your impulse in the firmware of your own embedded device, see . There are examples for Mbed OS, Arduino, STM32CubeIDE, Zephyr, and any other target that supports a C++ compiler.
Or if you're interested in more, see our tutorials on or . If you have a great idea for a different project, that's fine too. Edge Impulse lets you capture data from any sensor, build to extract features, and you have full flexibility in your Machine Learning pipeline with the learning blocks.
In this tutorial, you'll use machine learning to build a system that can recognize objects in your house through your Sony Spresense - a task known as image classification - connected to your device. Adding sight to your embedded devices can make them see the difference between poachers and elephants, do quality control on factory lines, or let your RC cars drive themselves. In this tutorial you'll learn how to collect images for a well-balanced dataset, how to apply transfer learning to train a neural network, and deploy the system to your Spresense.
At the end of this tutorial, you'll have a firm understanding of how to classify images using Edge Impulse.
You can view the finished project, including all data, signal processing and machine learning blocks here: Tutorial: adding sight to your sensors.
For this tutorial you'll need a Sony's Spresense with camera add-on.
If you don't have a device yet, you can also upload an existing dataset through the Uploader. After this tutorial you can then deploy your trained machine learning model as a C++ library and run it on your device by following the Running your impulse locally tutorial.
You also may want to create a new Edge Impulse project for this tutorial, or use an existing one. You can create a new project on your project dashboard
In this tutorial we'll build a model that can distinguish between two objects in your house - we've used a plant and a lamp, but feel free to pick two other objects. To make your machine learning model see it's important that you capture a lot of example images of these objects. When training the model these example images are used to let the model distinguish between them. Because there are (hopefully) a lot more objects in your house than just lamps or plants, you also need to capture images that are neither a lamp or a plant to make the model work well.
Capture the following amount of data - make sure you capture a wide variety of angles and zoom levels:
50 images of a lamp.
50 images of a plant.
50 images of neither a plant nor a lamp - make sure to capture a wide variation of random objects in the same room as your lamp or plant.
You can collect data from your Spresense using the Edge Impulse CLI. Make sure you followed the Getting Started guide for the Spresense, then run the edge impulse daemon.
Once connected follow the guide on Collecting Image Data from Studio to build your dataset. Alternatively, you can capture your images using another camera, and then upload them by going to Data acquisition and clicking the 'Upload' icon.
Afterwards you should have a well-balanced dataset listed under Data acquisition in your Edge Impulse project. You can switch between your training and testing data with the two buttons above the 'Data collected' widget.
With the training set in place you can design an impulse. An impulse takes the raw data, adjusts the image size, uses a preprocessing block to manipulate the image, and then uses a learning block to classify new data. Preprocessing blocks always return the same values for the same input (e.g. convert a color image into a grayscale one), while learning blocks learn from past experiences.
For this tutorial we'll use the 'Images' preprocessing block. This block takes in the color image, optionally makes the image grayscale, and then turns the data into a features array. If you want to do more interesting preprocessing steps - like finding faces in a photo before feeding the image into the network -, see the Building custom processing blocks tutorial. Then we'll use a 'Transfer Learning' learning block, which takes all the images in and learns to distinguish between the three ('plant', 'lamp', 'unknown') classes.
In the studio go to Create impulse, set the image width and image height to 96
, and add the 'Images' and 'Transfer Learning (Images)' blocks. Then click Save impulse.
To configure your processing block, click Images in the menu on the left. This will show you the raw data on top of the screen (you can select other files via the drop down menu), and the results of the processing step on the right. You can use the options to switch between 'RGB' and 'Grayscale' mode, but for now leave the color depth on 'RGB' and click Save parameters.
This will send you to the 'Feature generation' screen. In here you'll:
Resize all the data.
Apply the processing block on all this data.
Create a 3D visualization of your complete dataset.
Click Generate features to start the process.
Afterwards the 'Feature explorer' will load. This is a plot of all the data in your dataset. Because images have a lot of dimensions (here: 96x96x3=27,648 features) we run a process called 'dimensionality reduction' on the dataset before visualizing this. Here the 27,648 features are compressed down to just 3, and then clustered based on similarity. Even though we have little data you can already see some clusters forming (lamp images are all on the right), and can click on the dots to see which image belongs to which dot.
With all data processed it's time to start training a neural network. Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. The network that we're training here will take the image data as an input, and try to map this to one of the three classes.
It's very hard to build a good working computer vision model from scratch, as you need a wide variety of input data to make the model generalize well, and training such models can take days on a GPU. To make this easier and faster we are using transfer learning. This lets you piggyback on a well-trained model, only retraining the upper layers of a neural network, leading to much more reliable models that train in a fraction of the time and work with substantially smaller datasets.
To configure the transfer learning model, click Transfer learning in the menu on the left. Here you can select the base model (the one selected by default will work, but you can change this based on your size requirements), optionally enable data augmentation (images are randomly manipulated to make the model perform better in the real world), and the rate at which the network learns.
Set:
Transfer learning model to MobileNetV1 96x96 0.25
Number of training cycles to 20
.
Learning rate to 0.0005
.
Data augmentation: enabled.
Minimum confidence rating: 0.7.
And click Start training. After the model is done you'll see accuracy numbers, a confusion matrix and some predicted on-device performance on the bottom. You have now trained your model!
Important: The full example firmware for the Sony's Spresense supports a variety of sensor interfaces, data ingestion, and other capabilities. As a result - this firmware supports image transfer learning networks up to the MobileNetV1 96x96 0.25. Custom firmware may allow larger models to run on the device. To learn more, see Running your impulse locally to deploy trained Edge Impulse models with your firmware, and check out the EON Tuner documentation to learn how to optimize your model's performance and memory usage
With the model trained let's try it out on some test data. When collecting the data we split the data up between a training and a testing dataset. The model was trained only on the training data, and thus we can use the data in the testing dataset to validate how well the model will work in the real world. This will help us ensure the model has not learned to overfit the training data, which is a common occurrence.
To validate your model, go to Model testing, select the checkbox next to 'Sample name' and click Classify selected. Here we hit 89% accuracy, which is great for a model with so little data.
To see a classification in detail, click the three dots next to an item, and select Show classification. This brings you to the Live classification screen with much more details on the file (if you collected data with your mobile phone you can also capture new testing data directly from here). This screen can help you determine why items were misclassified.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption. Edge Impulse can package up the complete impulse - including the preprocessing steps, neural network weights, and classification code - in a single C++ library that you can include in your embedded software.
To run your impulse click on Deployment in the menu. Then under 'Build firmware' select the Spresense development board, and click Build. This will export the impulse, and build a binary that will run on your development board in a single step. After building is completed you'll get prompted to download a binary. Save this on your computer.
When you click the Build button, you'll see a pop-up with text and video instructions on how to deploy the binary to your particular device. Follow these instructions. Once you are done, we are ready to test your impulse out.
If you run into issues flashing your device, follow the steps and troubleshooting information in the Getting Started guide.
We can connect to the board's newly flashed firmware over serial. Open a terminal and run:
Congratulations! You've added sight to your sensors. Now that you've trained your model you can integrate your impulse in the firmware of your own embedded device, see Running your impulse locally. There are examples for Mbed OS, Arduino, STM32CubeIDE, Zephyr, and any other target that supports a C++ compiler.
Or if you're interested in more, see our tutorials on Continuous motion recognition or Adding sight to your sensors. If you have a great idea for a different project, that's fine too. Edge Impulse lets you capture data from any sensor, build custom processing blocks to extract features, and you have full flexibility in your Machine Learning pipeline with the learning blocks.
We can't wait to see what you'll build! 🚀
This tutorial will guide you through a people counting reference design built using the Silabs xG24 dev kit and the Arducam Mini 2MP Plus. The design showcases:
The Silabs xG24 Dev Kit featuring the EFR32 chipset with AI/ML accelerator providing:
Up to 3x speed increases in image-based ML processing (when compared to running a non-accelerated model),
An extremely low AI/ML and BT stack footprint allowing for concurrent inference and BT communication, and
Edge Impulse’s own FOMO algorithm providing:
image-based object detection at the lower end (ARM Cortex®-M33, 256kB RAM) of TinyML compute,
the ability to train object detection models using only ~100, instead of thousands of images,
the ability to detect objects at extremely low resolutions (64x64 pixels)
The diagram below depicts the ML lifecycle architecture defined for our people counting reference design. We used a single xG24 Dev Kit to implement either a collection or an inference flow, recursively as required, during the development process.
This guide assumes that you have already completed the getting started guide for the Silabs xG24 Dev Kit and have trained a model.
In order to replicate this reference design, you will also need:
An xG24 Dev Kit from Silabs
An Arducam Mini 2MP Plus
An Edge Impulse Studio account with a clone of the people counting project.
A development computer with the Edge Impulse CLI installed
For this project, we attached an Arducam mini 2MP plus to the xG24 Dev Kit in order to capture low-res images of people flow from a real environment. This can be achieved by connecting the two devices as specified in the table below:
Head over to your cloned Edge Impulse project, and go to Deployment. From here you can create the full firmware package built with all required libraries and dependencies. This includes the Silabs' Bluetooth stack which can broadcast inference results to nearby devices. Select Silabs xG24 Dev Kit and click Build to build the firmware. Then download and extract the .zip
file.
You can use your cloned project and xG24 Dev Kit camera assembly as a starting point to develop your own object detection project by following our FOMO guide.
In this tutorial, you will learn how to run multiple cameras with multiple models simultaneously on the Renesas RZ/V2L microprocessor. The models used for this demo are:
YOLOv5 - Face detection using DRPAI acceleration
ResNet - Number recognition on AARCH64 (no acceleration)
To follow this tutorial, you will need a combination of hardware and software tools. You will need an Edge Impulse compatible Renesas RZ board, either the Renesas RZ/V2L smarc board or the Avnet RZBoard. In addition to the board, the following additional items are needed:
Two webcams - the webcams used for this tutorial are:
Logitech C920 HD Pro Webcam
Logitech C922 Pro Stream 1080p Webcam
USB hub (to connect the cameras to the Renesas RZ/V2L board)
Ethernet cable or USB serial cable
Note that any webcams will suffice for this application. Some details will have to be modified depending on the cameras, this is explained in detail further along in the tutorial.
In order to successfully install and run the application, you need access to the target's Linux terminal. You may require other hardware like ethernet cable or USB to serial cable to achieve this. Detailed instructions on how to set the device up and gain access to the Linux are not provided here. For the Renesas RZ/V2L board, please refer to this page and for the Avnet RZBoard, please refer to this page.
The models used for this tutorial can be downloaded as a zip file from this link. The file (models-20230921T125858Z-001.zip
) that you download, may be named differently for you. Once the file has been downloaded, go ahead and unzip the files either in the GUI or as follows:
There are two folders in this zip file. The cpu directory contains a ResNet model that identifies numbers for a Linux aarch64 target without acceleration by the DRPAI. The drpai directory contains a YOLOv5 face detection model for the RZ/V2L microprocessor with DRPAI acceleration incorporated. Once the files have been extracted, their permissions need to be changed to make them executable. This can be done as follows:
The application required to run the two models simultaneously must be downloaded from the Edge Impulse CLI repository. The simplest way to do this is to run the following command in your terminal/command line application:
If you would like to do it manually, it can be downloaded here at the branch classify-camera-finer-control, at commit number: 1fc1639acdfb9e7af3fc8794619334b28dffac00
. Once the application has been downloaded, unzip the contents, navigate to the directory and list the contents to ensure the operation was successful. This can be done as follows:
While staying at the same directory, install the dependencies and build the application as follows:
Please ensure that you are inside the directory edge-impulse-linux-cli-classify-camera-finer-control
when installing and running the application.
Before we can run the application, please ensure that the directory structure is as follows:
In the previous step, we installed an application called classify-camera. To run this app with two cameras and two separate models, the same app needs to be executed twice in two separate terminal instances with different cameras. The arguments required by the app are:
model - name of the model to executed in the current instance (required)
camera - the name of the camera to be used for the instance as detected by the system (required)
fps - frames per second of the input stream. This will affect the speed of the app (default: 5)
width - the width of the input image directed to the respective model (default: 640)
height - the height of the input image directed to the respective model (default: 480)
The syntax of the command to be executed is as follows:
Before running the first instance, we first need to determine the names of the webcam(s) that are detected by the system. The correct name is required to direct the stream from camera to the respective models. The easiest way to do this is by running the command with only the model as an argument. In this case:
The command will fail and result in an error with the error message containing names of the connecting cameras. In this case, the output might be as follows:
From this output, we can determine that the names of cameras detected by the system are "HD Pro Webcam C920" and "C922 Pro Stream Webcam". These names can now be used to run the application. Before running the application, ensure that your current working directory is correct - it needs to be the edge-impulse-linux-cli-classify-camera-finer-control
folder. In this case, the cd
commands from the following instructions can be omitted.
You will need to open two terminal instances.
In terminal 1 run the following command:
And in terminal 2 run the following command:
Once both these commands have been executed, you will be able to see the results of the models in the terminal similar to the following in terminal 1:
and the following in terminal 2:
Running the application in this way will not provide a visual output. To see a visual result of this inference, please see the following section.
The above steps are enough to execute both models and see their outputs in the terminal. However, it is always better to see a visual output of the image detection models that are being executing. With a slight modification of the execution commands, it is possible to run the application using WebServer and see the visual output.
The arguments to run the application using WebServer are as follows:
model - name of the model to executed in the current instance (required)
camera - the name of the camera to be used for the instance as detected by the system (required)
fps - frames per second of the input stream. This will affect the speed of the app (default: 5)
width - the width of the input image directed to the respective model (default: 640)
height - the height of the input image directed to the respective model (default: 480)
port - this is the port to which the output from the cameras and application is directed (default: 4912)
The syntax of the command to be executed is then changed to the following:
Similar to running the application before, you will need to open two terminal instances.
In terminal 1 run the following command:
The terminal output from this command will be:
Navigating to the address mentioned in the output (http://192.168.178.30:4912
in this case), you will see an output similar to this:
In terminal 2 run the following command:
The terminal output from this command will be:
Note that the port numbers passed as arguments must be different. This allows for the streams to be directed to different ports to be viewed in the browser.
Navigating to the address mentioned in the output (http://192.168.178.30:4912
in this case), you will see an output similar to this:
Now you have two models running simultaneously. You will notice that the model running on the CPU has a significantly higher inference time (>1200ms) as opposed to the model running on the DRP-AI (<10ms). You can see a side by side view of this parallelism:
You can see the two models running simultaneously in the video below:
Congratulations! You have successfully run two Edge Impulse generated models simultaneously on the Renesas RZ/V2L device. You will notice that the model run using the DRP-AI accelerator is orders of magnitude faster than the one running simply on the CPU.
Using this application, you can run any two image-based models simultaneously on the RZV2L device. If you have a great idea for a different project, that's fine too. Edge Impulse lets you capture data from any sensor, build custom processing blocks to extract features, and you have full flexibility in your Machine Learning pipeline with the learning blocks.
The possibilities are endless, and we can't wait to see what you'll build! 🚀
In this tutorial, you'll use machine learning to build a model that recognizes when a doorbell rings. You'll use the Particle Photon 2, a microphone, a cell phone, and Edge Impulse to build this project. You can add the ability to send an SMS message when the doorbell rings by following the tutorial on Particle's website.
Audio event detection is a common machine learning task, and is a hard task to solve using rule-based programming. You'll learn how to collect audio data, build a neural network classifier, and how to deploy your model back to a device. At the end of this tutorial, you'll have a firm understanding of applying machine learning on Particle hardware using Edge Impulse.
Before starting the tutorial
After signing up for a free Edge Impulse account, clone the finished project, including all training data, signal processing and machine learning blocks here: Particle - Doorbell. At the end of the process you will have the full project that comes pre-loaded with training and test datasets.
For this tutorial you'll need the:
Particle Photon2 Board with the Edge ML Kit.
Before starting ingestion create an Edge Impulse account if you haven't already. To collect audio data you can use any mobile phone. Follow in the instructions on that page to connect your phone to your Edge Impulse project.
Once your device is connected under Devices in the studio you can proceed:
To build this project, you'll need to collect some audio data that will be used to train the machine learning model. Since the goal is to detect the sound of a doorbell, you'll need to collect some examples of that. You'll also need some examples of typical background noise that doesn't contain the sound of a doorbell, so the model can learn to discriminate between the two. These two types of examples represent the two classes we'll be training our model to detect: unknown noise, or doorbell.
Your phone will show up like any other device in Edge Impulse, and will automatically ask permission to use sensors. Let's start by recording an example of background noise that doesn't contain the sound of a doorbell. On your cell set the label to unknown
, the sample length to 2 seconds. This indicates that you want to record 1 second of audio, and label the recorded data as unknown
. You can later edit these labels if needed.
After you click Start recording, the device will capture a second of audio and transmit it to Edge Impulse.
When the data has been uploaded, you will see a new line appear under 'Collected data' in the Data acquisition tab of your Edge Impulse project. You will also see the waveform of the audio in the 'RAW DATA' box. You can use the controls underneath to listen to the audio that was captured.
Machine learning works best with lots of data, so a single sample won't cut it. Now is the time to start building your own dataset. For example, use the following two classes, and record around 3 minutes of data per class:
doorbell - Record yourself ringing the doorbell in the same room
unknown - Record the background noise in your house
With the training set in place you can design an impulse. An impulse takes the raw data, slices it up in smaller windows, uses signal processing blocks to extract features, and then uses a learning block to classify new data. Signal processing blocks always return the same values for the same input and are used to make raw data easier to process, while learning blocks learn from past experiences.
For this tutorial we'll use the signal processing block. Then we'll use a 'Neural Network' learning block, that takes these generated features and learns to distinguish between our different classes (circular or not).
In the studio go to Create impulse, set the window size to 1000
(you can click on the 1000 ms.
text to enter an exact value), the window increase to 500
, and add the 'Audio MFCC' and 'Classification (Keras)' blocks. Then click Save impulse.
Window size
Now that we've assembled the building blocks of our Impulse, we can configure each individual part. Click on the MFCC tab in the left hand navigation menu. You'll see a page that looks like this:
This page allows you to configure the MFCC block, and lets you preview how the data will be transformed. The right of the page shows a visualization of the MFCC's output for a piece of audio, which is known as a spectrogram.
The MFCC block transforms a window of audio into a table of data where each row represents a range of frequencies and each column represents a span of time. The value contained within each cell reflects the amplitude of its associated range of frequencies during that span of time. The spectrogram shows each cell as a colored block, the intensity which varies depends on the amplitude.
The patterns visible in a spectrogram contain information about what type of sound it represents. For example, the spectrogram in this image shows a pattern typical of background noise:
It's interesting to explore your data and look at the types of spectrograms it results in. You can use the dropdown box near the top right of the page to choose between different audio samples to visualize, and drag the white window on the audio waveform to select different windows of data:
There are a lot of different ways to configure the MFCC block, as shown in the Parameters box:
Handily, Edge Impulse provides sensible defaults that will work well for many use cases, so we can leave these values unchanged. You can play around with the noise floor to quickly see the effect it has on the spectrogram.
The spectrograms generated by the MFCC block will be passed into a neural network architecture that is particularly good at learning to recognize patterns in this type of tabular data. Before training our neural network, we'll need to generate MFCC blocks for all of our windows of audio. To do this, click the Generate features button at the top of the page, then click the green Generate features button. If you have a full 10 minutes of data, the process will take a while to complete:
Once this process is complete the feature explorer shows a visualization of your dataset. Here dimensionality reduction is used to map your features onto a 3D space, and you can use the feature explorer to see if the different classes separate well, or find mislabeled data (if it shows in a different cluster). You can find more information in visualizing complex datasets.
Next, we'll configure the neural network and begin training.
With all data processed it's time to start training a neural network. Neural networks are algorithms, modeled loosely after the human brain, that can learn to recognize patterns that appear in their training data. The network that we're training here will take the MFCC as an input, and try to map this to one of two classes—doorbell or unknown.
Click on NN Classifier in the left hand menu. You'll see the following page:
A neural network is composed of layers of virtual "neurons", which you can see represented on the left hand side of the NN Classifier page. An input—in our case, an MFCC spectrogram—is fed into the first layer of neurons, which filters and transforms it based on each neuron's unique internal state. The first layer's output is then fed into the second layer, and so on, gradually transforming the original input into something radically different. In this case, the spectrogram input is transformed over four intermediate layers into just two numbers: the probability that the input represents noise, and the probability that the input represents a doorbell.
During training, the internal state of the neurons is gradually tweaked and refined so that the network transforms its input in just the right ways to produce the correct output. This is done by feeding in a sample of training data, checking how far the network's output is from the correct answer, and adjusting the neurons' internal state to make it more likely that a correct answer is produced next time. When done thousands of times, this results in a trained network.
A particular arrangement of layers is referred to as an architecture, and different architectures are useful for different tasks. The default neural network architecture provided by Edge Impulse will work well for our current project, but you can also define your own architectures. You can even import custom neural network code from tools used by data scientists, such as TensorFlow and Keras.
The default settings should work, and to begin training, click Start training. You'll see a lot of text flying past in the Training output panel, which you can ignore for now. Training will take a few minutes. When it's complete, you'll see the Model panel appear at the right side of the page.
Congratulations, you've trained a neural network with Edge Impulse! But what do all these numbers mean?
At the start of training, 20% of the training data is set aside for validation. This means that instead of being used to train the model, it is used to evaluate how the model is performing. The Last training performance panel displays the results of this validation, providing some vital information about your model and how well it is working. Bear in mind that your exact numbers may differ from the ones in this tutorial.
On the left hand side of the panel, Accuracy refers to the percentage of windows of audio that were correctly classified. The higher number the better, although an accuracy approaching 100% is unlikely, and is often a sign that your model has overfit the training data. You will find out whether this is true in the next stage, during model testing. For many applications, an accuracy above 80% can be considered very good.
The Confusion matrix is a table showing the balance of correctly versus incorrectly classified windows. To understand it, compare the values in each row.
The On-device performance region shows statistics about how the model is likely to run on-device. Inferencing time is an estimate of how long the model will take to analyze one second of data on a typical microcontroller (here: an Arm Cortex-A33 running at 200MHz). Peak memory usage gives an idea of how much RAM will be required to run the model on-device.
The performance numbers in the previous step show that our model is working well on its training data, but it's extremely important that we test the model on new, unseen data before deploying it in the real world. This will help us ensure the model has not learned to overfit the training data, which is a common occurrence.
Edge Impulse provides some helpful tools for testing our model, including a way to capture live data from your device and immediately attempt to classify it. To try it out, click on Live classification in the left hand menu. Your device should show up in the 'Classify new data' panel. Capture 5 seconds of background noise by clicking Start sampling.
The sample will be captured, uploaded, and classified. Once this has happened, you'll see a breakdown of the results.
Once the sample is uploaded, it is split into windows–in this case, a total of 41. These windows are then classified. As you can see, our model classified all 41 windows of the captured audio as noise. This is a great result! Our model has correctly identified that the audio was background noise, even though this is new data that was not part of its training set.
Of course, it's possible some of the windows may be classified incorrectly. If your model didn't perform perfectly, don't worry. We'll get to troubleshooting later.
Misclassifications and uncertain results
It's inevitable that even a well-trained machine learning model will sometimes misclassify its inputs. When you integrate a model into your application, you should take into account that it will not always give you the correct answer.
For example, if you are classifying audio, you might want to classify several windows of data and average the results. This will give you better overall accuracy than assuming that every individual result is correct.
Using the Live classification tab, you can easily try out your model and get an idea of how it performs. But to be really sure that it is working well, we need to do some more rigorous testing. That's where the Model testing tab comes in. If you open it up, you'll see the sample we just captured listed in the Test data panel.
In addition to its training data, every Edge Impulse project also has a test dataset. Samples captured in Live classification are automatically saved to the test dataset, and the Model testing tab lists all of the test data.
To use the sample we've just captured for testing, we should correctly set its expected outcome. Click the ⋮
icon and select Edit expected outcome, then enter noise
. Now, select the sample using the checkbox to the left of the table and click Classify selected.
You'll see that the model's accuracy has been rated based on the test data. Right now, this doesn't give us much more information that just classifying the same sample in the Live classification tab. But if you build up a big, comprehensive set of test samples, you can use the Model testing tab to measure how your model is performing on real data.
Ideally, you'll want to collect a test set that contains a minimum of 25% the amount of data of your training set. So, if you've collected 10 minutes of training data, you should collect at least 2.5 minutes of test data. You should make sure this test data represents a wide range of possible conditions, so that it evaluates how the model performs with many different types of inputs. For example, collecting test audio for several different doorbells, perhaps moving collecting the audio in a different room, is a good idea.
You can use the Data acquisition tab to manage your test data. Open the tab, and then click Test data at the top. Then, use the Record new data panel to capture a few minutes of test data, including audio for both background noise and doorbells. Make sure the samples are labelled correctly. Once you're done, head back to the Model testing tab, select all the samples, and click Classify selected.
The screenshot shows classification results from a large number of test samples (there are more on the page than would fit in the screenshot). It's normal for a model to perform less well on entirely fresh data.
For each test sample, the panel shows a breakdown of its individual performance. Samples that contain a lot of misclassifications are valuable, since they have examples of types of audio that our model does not currently fit. It's often worth adding these to your training data, which you can do by clicking the ⋮
icon and selecting Move to training set. If you do this, you should add some new test data to make up for the loss!
Testing your model helps confirm that it works in real life, and it's something you should do after every change. However, if you often make tweaks to your model to try to improve its performance on the test dataset, your model may gradually start to overfit to the test dataset, and it will lose its value as a metric. To avoid this, continually add fresh data to your test dataset.
Data hygiene
It's extremely important that data is never duplicated between your training and test datasets. Your model will naturally perform well on the data that it was trained on, so if there are duplicate samples then your test results will indicate better performance than your model will achieve in the real world.
If the network performed great, fantastic! But what if it performed poorly? There could be a variety of reasons, but the most common ones are:
The data does not look like other data the network has seen before. This is common when someone uses the device in a way that you didn't add to the test set. You can add the current file to the test set by adding the correct label in the 'Expected outcome' field, clicking ⋮
, then selecting Move to training set.
The model has not been trained enough. Increase number of epochs to 200
and see if performance increases (the classified file is stored, and you can load it through 'Classify existing validation sample').
The model is overfitting and thus performs poorly on new data. Try reducing the number of epochs, reducing the learning rate, or adding more data.
The neural network architecture is not a great fit for your data. Play with the number of layers and neurons and see if performance improves.
As you see, there is still a lot of trial and error when building neural networks. Edge Impulse is continually adding features that will make it easier to train an effective model.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption.
To export your model, click on Deployment in the menu. Then under 'Build firmware' select the Particle Library.
Once optimized parameters have been found, you can click Build. This will build a particle workbench compatible package that will run on your development board. After building is completed you'll get prompted to download a zipfile. Save this on your computer. A pop-up video will show how the download process works.
After unzipping the downloaded file, you can open the project in Particle Workbench.
The development board does not come with the Edge Impulse firmware. To flash the Edge Impulse firmware:
Open a new VS Code window, ensure that Particle Workbench has been installed (see above)
Use VS Code Command Palette and type in Particle: Import Project
Select the project.properties
file in the directory that you just downloaded and extracted from the section above.
Use VS Code Command Palette and type in Particle: Configure Project for Device
Select deviceOS@5.3.2 (or a later version)
Choose a target. (e.g. P2 , this option is also used for the Photon 2).
It is sometimes needed to manually put your Device into DFU Mode. You may proceed to the next step, but if you get an error indicating that "No DFU capable USB device available" then please follow these step.
Hold down both the RESET and MODE buttons.
Release only the RESET button, while holding the MODE button.
Wait for the LED to start flashing yellow.
Release the MODE button.
Compile and Flash in one command with: Particle: Flash application & DeviceOS (local)
Local Compile Only! At this time you cannot use the Particle: Cloud Compile or Particle: Cloud Flash options; local compilation is required.
Serial Connection to Computer
The Particle libraries generated by Edge Impulse have serial output through a virtual COM port on the serial cable. The serial settings are 115200, 8, N, 1. Particle Workbench contains a serial monitor accessible via Particle: Serial Monitor via the VS Code Command Palette.
Device compatibility
Initial release of the particle integration will rely on the particle-ingestion project and the data forwarder to be installed on the device and computer. This is currently only supported on the Particle Photon 2, but should run on other devices as well.
This standalone example project contains minimal code required to run the imported impulse on the device. This code is located in main.cpp
. In this minimal code example, inference is run from a static buffer of input feature data. To verify that our embedded model achieves the exact same results as the model trained in Studio, we want to copy the same input features from Studio into the static buffer in main.cpp
.
To do this, first head back to Edge Impulse Studio and click on the Live classification tab. Follow this video for instructions.
In main.cpp
paste the raw features inside the static const float features[]
definition, for example:
The project will repeatedly run inference on this buffer of raw features once built. This will show that the inference result is identical to the Live classification tab in Studio. Use new sensor data collected in real time on the device to fill a buffer. From there, follow the same code used in main.cpp
to run classification on live data.
Follow the flashing instructions above to run on device.
Victory! You've now built your first on-device machine learning model.
Congratulations! Now that you've trained and deployed your model you can go further and build your own custom firmware. We can't wait to see what you'll build! 🚀
This tutorial is for Particle hardware only, and just the & devices.
In this tutorial, you'll use machine learning to build a gesture recognition system that runs on the Photon 2 or Boron. This is a hard task to solve using rule-based programming, as people don't perform gestures in the exact same way every time. But machine learning can handle these variations with ease. You'll learn how to collect high-frequency data from an IMU, build a neural network classifier, and how to deploy your model back to a device. At the end of this tutorial, you'll have a firm understanding of applying machine learning with Particle Photon 2 or Boron using Edge Impulse.
Before starting the tutorial
After signing up for a free Edge Impulse account, clone the finished project, including all training data, signal processing and machine learning blocks here: . At the end of the process you will have the full project that comes pre-loaded with training and test datasets.
For this tutorial you'll need:
OR
Each device page above includes instructions for collecting data using the appropriate path, since Photon 2 can be flashed with a binary and directly connect to Studio while the Boron device cannot and needs to use the .
To collect data from the Photon 2 please follow these steps:
Create a new Edge Impulse Studio project, remember the name you create for it.
Connect your device to the Edge Impulse studio by running following command in a terminal:
After connecting, the Edge Impulse Daemon will ask to login to your account and select the project. Alternatively, you can copy the API Key from the Keys section of your project and use the --api-key flag instead of --clean.
Start gathering data by clicking on Data acquisition
Start gathering data by clicking on Data acquisition and click Start sampling
With the training set in place you can design an impulse. An impulse takes the raw data, slices it up in smaller windows, uses signal processing blocks to extract features, and then uses a learning block to classify new data. Signal processing blocks always return the same values for the same input and are used to make raw data easier to process, while learning blocks learn from past experiences.
For this tutorial we'll use the 'Spectral Analysis' signal processing block which turns raw data into a set of important features. Then we'll use a 'Neural Network' learning block, that takes these generated features and learns to distinguish between our different classes of motion.
In Studio go to Create impulse, then click on Add a processing block and choose Spectral Analysis. Next click on Add a learning block and choose Classification and then finally click Save impulse.
To configure your signal processing block, click Spectral features in the menu on the left. This will show you the raw data on top of the screen (you can select other files via the drop down menu), and the processed features on the right.
Click Save parameters. This will send you to the 'Feature generation' screen.
Click Generate features to start the process.
Afterwards the 'Feature explorer' will load. This is a plot of all the extracted features against all the generated windows. You can use this graph to compare your complete data set. A good rule of thumb is that if you can visually separate the data on a number of axes, then the machine learning model will be able to do so as well.
With all data processed it's time to start training a neural network. Neural networks are algorithms, modeled loosely after the human brain, that can learn to recognize patterns that appear in their training data. The network that we're training here will take the processing block features as an input, and try to map this to one of the possible classes.
Click on Classifier in the left hand menu.
With everything in place, click Start training.
Congratulations, you've trained a neural network with Edge Impulse and ready to deploy on Syntiant hardware! But what do all these numbers mean?
At the start of training, 20% of the training data is set aside for validation. This means that instead of being used to train the model, it is used to evaluate how the model is performing. The Last training performance panel displays the results of this validation, providing some vital information about your model and how well it is working. Bear in mind that your exact numbers may differ from the ones in this tutorial.
On the left hand side of the panel, Accuracy refers to the percentage of windows of audio that were correctly classified. The higher number the better, although an accuracy approaching 100% is unlikely, and is often a sign that your model has overfit the training data. You will find out whether this is true in the next stage, during model testing. For many applications, an accuracy above 85% can be considered very good.
The Confusion matrix is a table showing the balance of correctly versus incorrectly classified windows. To understand it, compare the values in each row.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption.
To export your model, click on Deployment in the menu. Then under 'Build firmware' select the Particle Library option and click Build
Follow the instructions from the device's documentation page for how to use the Particle Workbench extension in VS Code and deploy on your device.
Congratulations! Now that you've trained and deployed your model you can go further and build your own custom firmware!
We can't wait to see what you'll build! 🚀
The Nvidia TAO model zoo offers a large variety of powerful pre-trained models that can help you get started on and make good progress on your projects in a very short period of time. With Edge Impulse's partnership with Nvidia, these models are now available for you to be used in your projects. The Renesas RA8D1 evaluation kit, equipped with the most advanced Cortex-M85 microcontroller and ARM Helium technology, is a powerhouse for edge AI applications. This tutorial will walk you through the steps of harnessing the combined power of Nvidia TAO and the Cortex-M85, by running your TAO based projects on the Renesas RA8D1.
To follow this tutorial you will need:
a Renesas EK-RA8D1 evaluation kit. The kit can be purchased .
the Renesas e2studio IDE and coding tool. This can be downloaded .
an Edge Impulse project using an Nvidia TAO training block. This step is detailed in this document.
To follow this tutorial, you will need to build an object detection or image classification project. We will create an object detection project using the following steps:
Create a new project in Edge Impulse.
Make sure to set your labelling method to 'Bounding boxes (object detection)'.
Collect and prepare your dataset as detailed in . In this tutorial, we will be creating a dataset to detect bottles vs. cans.
Create your impulse and add an 'NVIDIA TAO ...' block to it. Some blocks have a requirement for the input images to be a multiple of 32. If that's the case for the block you chose, resize the input image to a correct size e.g. 96x96 or 160x160.
Under NVIDIA TAO... block, select between various parameters. Each TAO pre-trained block supports several backbones trained on subsets of Google OpenImages dataset. In total there are 88 object detection architectures. For this tutorial, we will choose the MobileNet v2 3x224x224 backbone and leave the remaining settings to their default values.
Click on 'Start training' to train the model.
Open this firmware in an IDE to access the build settings. We recommend using the renesas e2studio IDE mentioned in the prerequisites. Simply, open the IDE and choose the option to import as preexisting project from folder or archive. Once imported, navigate to project > settings > C/C++ build > settings > tool settings
.
In the 'Tool Settings' tab, navigate to GNU Arm Cross C++ Compiler > Preprocessor
. Under these settings, under 'Defined symbols (-D)', add EI_CLASSIFIER_ALLOCATION_STATIC
as an additional defined symbol. This is also shown below:
Hit 'Apply and Close' and build the project.
After downloading the project as a C++ library, navigate to tflite-model/tflite_learn_##_compiled.cpp
. The '##' will be a number that will vary for each download.
Within this file, find the following block of code (usually around line 74):
and replace it with the following:
This will force the tensor arena, the area where the model is stored, to be created in the external storage which has 64MB SDRAM capacity, which is more than the internal 2MB Code Flash, 1MB SRAM capacity.
After making these changes, navigate back to the firmware downloaded in step 1. The file structure in this repository is as follows:
Here, replace the folders model/model-parameters
and model/tflite-model
, by the same folders downloaded from the studio as a C++ library. This will overwrite the standard firmware, with the one we modified to accommodate the Nvidia TAO model.
Now you are ready to use your Nvidia TAO based model on the Renesas RA8 device!
Sometimes after installing the e2studio application on your MacBook, you might see the error "E2studio.app" is damaged and cannot be opened
when trying to open the application. You can resolve this problem by using the following terminal command to mark the it as “valid” after extracting it from the archive file:
making sure to replace /path/to/E2studio.app
with the actual location of your e2studio application. One easy way of doing this is to drag the E2studio.app icon from your Finder window into the Terminal window, right after typing the initial part of the command. After this, you should be able to use the application without any issues.
xG24 Dev Kit pin | Arducam Mini 2MP Plus |
---|---|
Open your Edge Impulse Studio Project and click on Devices. Verify that your device is listed here.
Navigate to the project on Github and clone the repo and follow the Readme.
Connect your Boron device with ADXL345 connected and start the being sure to follow the instructions for the forwarder to connect it to your Studio project.
There are several pre-trained 3x224x224 backbones from the , and others trained by Edge Impulse on ImageNet.
Once the model has finished training, you will be ready to deploy it to your device. For non-Nvidia TAO models, the process to deploy on the RA8 is the standard method described . However, applications using TAO models require more space to store the model and application, which is why we need to manually allocate the storage location in the RA8 firmware before flashing. This is a multi-step process described below.
Download the RA8 firmware from the repository .
Now go back to the project created in studio in the section . Download the project as a C++ library instead of an RA8 deployment.
Follow the from the firmware repository to build and flash the firmware to your RA8.
1
GND
2
3
SDA
4
MOSI
5
6
MISO
7
8
SCK
9
10
11
SCL
12
13
CS
14
15
16
17
18
VSS
19
20