Audio event detection with Particle boards
Last updated
Last updated
In this tutorial, you'll use machine learning to build a model that recognizes when a doorbell rings. You'll use the Particle Photon 2, a microphone, a cell phone, and Edge Impulse to build this project. You can add the ability to send an SMS message when the doorbell rings by following the tutorial on Particle's website.
Audio event detection is a common machine learning task, and is a hard task to solve using rule-based programming. You'll learn how to collect audio data, build a neural network classifier, and how to deploy your model back to a device. At the end of this tutorial, you'll have a firm understanding of applying machine learning on Particle hardware using Edge Impulse.
Before starting the tutorial
After signing up for a free Edge Impulse account, clone the finished project, including all training data, signal processing and machine learning blocks here: Particle - Doorbell. At the end of the process you will have the full project that comes pre-loaded with training and test datasets.
For this tutorial you'll need the:
Particle Photon2 Board with the Edge ML Kit.
Before starting ingestion create an Edge Impulse account if you haven't already. To collect audio data you can use any mobile phone. Follow in the instructions on that page to connect your phone to your Edge Impulse project.
Once your device is connected under Devices in the studio you can proceed:
To build this project, you'll need to collect some audio data that will be used to train the machine learning model. Since the goal is to detect the sound of a doorbell, you'll need to collect some examples of that. You'll also need some examples of typical background noise that doesn't contain the sound of a doorbell, so the model can learn to discriminate between the two. These two types of examples represent the two classes we'll be training our model to detect: unknown noise, or doorbell.
Your phone will show up like any other device in Edge Impulse, and will automatically ask permission to use sensors. Let's start by recording an example of background noise that doesn't contain the sound of a doorbell. On your cell set the label to unknown
, the sample length to 2 seconds. This indicates that you want to record 1 second of audio, and label the recorded data as unknown
. You can later edit these labels if needed.
After you click Start recording, the device will capture a second of audio and transmit it to Edge Impulse.
When the data has been uploaded, you will see a new line appear under 'Collected data' in the Data acquisition tab of your Edge Impulse project. You will also see the waveform of the audio in the 'RAW DATA' box. You can use the controls underneath to listen to the audio that was captured.
Machine learning works best with lots of data, so a single sample won't cut it. Now is the time to start building your own dataset. For example, use the following two classes, and record around 3 minutes of data per class:
doorbell - Record yourself ringing the doorbell in the same room
unknown - Record the background noise in your house
With the training set in place you can design an impulse. An impulse takes the raw data, slices it up in smaller windows, uses signal processing blocks to extract features, and then uses a learning block to classify new data. Signal processing blocks always return the same values for the same input and are used to make raw data easier to process, while learning blocks learn from past experiences.
For this tutorial we'll use the signal processing block. Then we'll use a 'Neural Network' learning block, that takes these generated features and learns to distinguish between our different classes (circular or not).
In the studio go to Create impulse, set the window size to 1000
(you can click on the 1000 ms.
text to enter an exact value), the window increase to 500
, and add the 'Audio MFCC' and 'Classification (Keras)' blocks. Then click Save impulse.
Window size
Now that we've assembled the building blocks of our Impulse, we can configure each individual part. Click on the MFCC tab in the left hand navigation menu. You'll see a page that looks like this:
This page allows you to configure the MFCC block, and lets you preview how the data will be transformed. The right of the page shows a visualization of the MFCC's output for a piece of audio, which is known as a spectrogram.
The MFCC block transforms a window of audio into a table of data where each row represents a range of frequencies and each column represents a span of time. The value contained within each cell reflects the amplitude of its associated range of frequencies during that span of time. The spectrogram shows each cell as a colored block, the intensity which varies depends on the amplitude.
The patterns visible in a spectrogram contain information about what type of sound it represents. For example, the spectrogram in this image shows a pattern typical of background noise:
It's interesting to explore your data and look at the types of spectrograms it results in. You can use the dropdown box near the top right of the page to choose between different audio samples to visualize, and drag the white window on the audio waveform to select different windows of data:
There are a lot of different ways to configure the MFCC block, as shown in the Parameters box:
Handily, Edge Impulse provides sensible defaults that will work well for many use cases, so we can leave these values unchanged. You can play around with the noise floor to quickly see the effect it has on the spectrogram.
The spectrograms generated by the MFCC block will be passed into a neural network architecture that is particularly good at learning to recognize patterns in this type of tabular data. Before training our neural network, we'll need to generate MFCC blocks for all of our windows of audio. To do this, click the Generate features button at the top of the page, then click the green Generate features button. If you have a full 10 minutes of data, the process will take a while to complete:
Once this process is complete the feature explorer shows a visualization of your dataset. Here dimensionality reduction is used to map your features onto a 3D space, and you can use the feature explorer to see if the different classes separate well, or find mislabeled data (if it shows in a different cluster). You can find more information in visualizing complex datasets.
Next, we'll configure the neural network and begin training.
With all data processed it's time to start training a neural network. Neural networks are algorithms, modeled loosely after the human brain, that can learn to recognize patterns that appear in their training data. The network that we're training here will take the MFCC as an input, and try to map this to one of two classes—doorbell or unknown.
Click on NN Classifier in the left hand menu. You'll see the following page:
A neural network is composed of layers of virtual "neurons", which you can see represented on the left hand side of the NN Classifier page. An input—in our case, an MFCC spectrogram—is fed into the first layer of neurons, which filters and transforms it based on each neuron's unique internal state. The first layer's output is then fed into the second layer, and so on, gradually transforming the original input into something radically different. In this case, the spectrogram input is transformed over four intermediate layers into just two numbers: the probability that the input represents noise, and the probability that the input represents a doorbell.
During training, the internal state of the neurons is gradually tweaked and refined so that the network transforms its input in just the right ways to produce the correct output. This is done by feeding in a sample of training data, checking how far the network's output is from the correct answer, and adjusting the neurons' internal state to make it more likely that a correct answer is produced next time. When done thousands of times, this results in a trained network.
A particular arrangement of layers is referred to as an architecture, and different architectures are useful for different tasks. The default neural network architecture provided by Edge Impulse will work well for our current project, but you can also define your own architectures. You can even import custom neural network code from tools used by data scientists, such as TensorFlow and Keras.
The default settings should work, and to begin training, click Start training. You'll see a lot of text flying past in the Training output panel, which you can ignore for now. Training will take a few minutes. When it's complete, you'll see the Model panel appear at the right side of the page.
Congratulations, you've trained a neural network with Edge Impulse! But what do all these numbers mean?
At the start of training, 20% of the training data is set aside for validation. This means that instead of being used to train the model, it is used to evaluate how the model is performing. The Last training performance panel displays the results of this validation, providing some vital information about your model and how well it is working. Bear in mind that your exact numbers may differ from the ones in this tutorial.
On the left hand side of the panel, Accuracy refers to the percentage of windows of audio that were correctly classified. The higher number the better, although an accuracy approaching 100% is unlikely, and is often a sign that your model has overfit the training data. You will find out whether this is true in the next stage, during model testing. For many applications, an accuracy above 80% can be considered very good.
The Confusion matrix is a table showing the balance of correctly versus incorrectly classified windows. To understand it, compare the values in each row.
The On-device performance region shows statistics about how the model is likely to run on-device. Inferencing time is an estimate of how long the model will take to analyze one second of data on a typical microcontroller (here: an Arm Cortex-A33 running at 200MHz). Peak memory usage gives an idea of how much RAM will be required to run the model on-device.
The performance numbers in the previous step show that our model is working well on its training data, but it's extremely important that we test the model on new, unseen data before deploying it in the real world. This will help us ensure the model has not learned to overfit the training data, which is a common occurrence.
Edge Impulse provides some helpful tools for testing our model, including a way to capture live data from your device and immediately attempt to classify it. To try it out, click on Live classification in the left hand menu. Your device should show up in the 'Classify new data' panel. Capture 5 seconds of background noise by clicking Start sampling.
The sample will be captured, uploaded, and classified. Once this has happened, you'll see a breakdown of the results.
Once the sample is uploaded, it is split into windows–in this case, a total of 41. These windows are then classified. As you can see, our model classified all 41 windows of the captured audio as noise. This is a great result! Our model has correctly identified that the audio was background noise, even though this is new data that was not part of its training set.
Of course, it's possible some of the windows may be classified incorrectly. If your model didn't perform perfectly, don't worry. We'll get to troubleshooting later.
Misclassifications and uncertain results
It's inevitable that even a well-trained machine learning model will sometimes misclassify its inputs. When you integrate a model into your application, you should take into account that it will not always give you the correct answer.
For example, if you are classifying audio, you might want to classify several windows of data and average the results. This will give you better overall accuracy than assuming that every individual result is correct.
Using the Live classification tab, you can easily try out your model and get an idea of how it performs. But to be really sure that it is working well, we need to do some more rigorous testing. That's where the Model testing tab comes in. If you open it up, you'll see the sample we just captured listed in the Test data panel.
In addition to its training data, every Edge Impulse project also has a test dataset. Samples captured in Live classification are automatically saved to the test dataset, and the Model testing tab lists all of the test data.
To use the sample we've just captured for testing, we should correctly set its expected outcome. Click the ⋮
icon and select Edit expected outcome, then enter noise
. Now, select the sample using the checkbox to the left of the table and click Classify selected.
You'll see that the model's accuracy has been rated based on the test data. Right now, this doesn't give us much more information that just classifying the same sample in the Live classification tab. But if you build up a big, comprehensive set of test samples, you can use the Model testing tab to measure how your model is performing on real data.
Ideally, you'll want to collect a test set that contains a minimum of 25% the amount of data of your training set. So, if you've collected 10 minutes of training data, you should collect at least 2.5 minutes of test data. You should make sure this test data represents a wide range of possible conditions, so that it evaluates how the model performs with many different types of inputs. For example, collecting test audio for several different doorbells, perhaps moving collecting the audio in a different room, is a good idea.
You can use the Data acquisition tab to manage your test data. Open the tab, and then click Test data at the top. Then, use the Record new data panel to capture a few minutes of test data, including audio for both background noise and doorbells. Make sure the samples are labelled correctly. Once you're done, head back to the Model testing tab, select all the samples, and click Classify selected.
The screenshot shows classification results from a large number of test samples (there are more on the page than would fit in the screenshot). It's normal for a model to perform less well on entirely fresh data.
For each test sample, the panel shows a breakdown of its individual performance. Samples that contain a lot of misclassifications are valuable, since they have examples of types of audio that our model does not currently fit. It's often worth adding these to your training data, which you can do by clicking the ⋮
icon and selecting Move to training set. If you do this, you should add some new test data to make up for the loss!
Testing your model helps confirm that it works in real life, and it's something you should do after every change. However, if you often make tweaks to your model to try to improve its performance on the test dataset, your model may gradually start to overfit to the test dataset, and it will lose its value as a metric. To avoid this, continually add fresh data to your test dataset.
Data hygiene
It's extremely important that data is never duplicated between your training and test datasets. Your model will naturally perform well on the data that it was trained on, so if there are duplicate samples then your test results will indicate better performance than your model will achieve in the real world.
If the network performed great, fantastic! But what if it performed poorly? There could be a variety of reasons, but the most common ones are:
The data does not look like other data the network has seen before. This is common when someone uses the device in a way that you didn't add to the test set. You can add the current file to the test set by adding the correct label in the 'Expected outcome' field, clicking ⋮
, then selecting Move to training set.
The model has not been trained enough. Increase number of epochs to 200
and see if performance increases (the classified file is stored, and you can load it through 'Classify existing validation sample').
The model is overfitting and thus performs poorly on new data. Try reducing the number of epochs, reducing the learning rate, or adding more data.
The neural network architecture is not a great fit for your data. Play with the number of layers and neurons and see if performance improves.
As you see, there is still a lot of trial and error when building neural networks. Edge Impulse is continually adding features that will make it easier to train an effective model.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption.
To export your model, click on Deployment in the menu. Then under 'Build firmware' select the Particle Library.
Once optimized parameters have been found, you can click Build. This will build a particle workbench compatible package that will run on your development board. After building is completed you'll get prompted to download a zipfile. Save this on your computer. A pop-up video will show how the download process works.
After unzipping the downloaded file, you can open the project in Particle Workbench.
The development board does not come with the Edge Impulse firmware. To flash the Edge Impulse firmware:
Open a new VS Code window, ensure that Particle Workbench has been installed (see above)
Use VS Code Command Palette and type in Particle: Import Project
Select the project.properties
file in the directory that you just downloaded and extracted from the section above.
Use VS Code Command Palette and type in Particle: Configure Project for Device
Select deviceOS@5.3.2
(or a later version)
Choose a target. (e.g. P2 , this option is also used for the Photon 2).
It is sometimes needed to manually put your Device into DFU Mode. You may proceed to the next step, but if you get an error indicating that "No DFU capable USB device available" then please follow these step.
Hold down both the RESET and MODE buttons.
Release only the RESET button, while holding the MODE button.
Wait for the LED to start flashing yellow.
Release the MODE button.
Compile and Flash in one command with: Particle: Flash application & DeviceOS (local)
Local Compile Only! At this time you cannot use the Particle: Cloud Compile or Particle: Cloud Flash options; local compilation is required.
Serial Connection to Computer
The Particle libraries generated by Edge Impulse have serial output through a virtual COM port on the serial cable. The serial settings are 115200, 8, N, 1. Particle Workbench contains a serial monitor accessible via Particle: Serial Monitor via the VS Code Command Palette.
Device compatibility
Initial release of the particle integration will rely on the particle-ingestion project and the data forwarder to be installed on the device and computer. This is currently only supported on the Particle Photon 2, but should run on other devices as well.
This standalone example project contains minimal code required to run the imported impulse on the device. This code is located in main.cpp
. In this minimal code example, inference is run from a static buffer of input feature data. To verify that our embedded model achieves the exact same results as the model trained in Studio, we want to copy the same input features from Studio into the static buffer in main.cpp
.
To do this, first head back to Edge Impulse Studio and click on the Live classification tab. Follow this video for instructions.
In main.cpp
paste the raw features inside the static const float features[]
definition, for example:
The project will repeatedly run inference on this buffer of raw features once built. This will show that the inference result is identical to the Live classification tab in Studio. Use new sensor data collected in real time on the device to fill a buffer. From there, follow the same code used in main.cpp
to run classification on live data.
Follow the flashing instructions above to run on device.
Victory! You've now built your first on-device machine learning model.
Congratulations! Now that you've trained and deployed your model you can go further and build your own custom firmware. We can't wait to see what you'll build! 🚀