Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Build a machine learning model and deploy it to a Nordic Thingy:53 to detect the sound of breaking glass.
Created By: Zalmotek
Public Project Link:
https://studio.edgeimpulse.com/studio/139844
Glass/window breaking detection systems are used in office buildings for safety purposes. They can be used to detect when a window is broken and trigger an alarm. These systems can also be used to collect data about the event, such as the time, location, and type of break, thus generating data that can be used to further bolster the safety of office buildings in the future.
There are many different types of glass/window breaking detection systems available on the market but they fall in two broad categories:
Systems that use vibration and audio sensors to detect the sound of breaking glass.
Computer vision based systems used to detect signs of damage in the windows.
The biggest challenge with any detection system is to minimize false positives - that is, to avoid triggering an alarm when there is no actual danger. This is especially important in the case of glass/window breaking detection systems, as a false positive can cause significant disruption and even panic.
There are many factors that can cause a false positive with these types of systems, such as:
Background noise: office buildings are typically full of ambient noise (e.g. people talking, computers humming, etc.) which can make it difficult for sensors to accurately identify the sound of breaking glass.
Weather: windy conditions can also create background noise that can interfere with sensor accuracy.
Sound Volume: if the sound of breaking glass is not loud enough, it may not be picked up by sensors.
Our approach for those challenges is to create an IoT system based on the Nordic Thingy:53™,development board that will run a machine learning model trained using the Edge Impulse platform that can detect the sound of breaking glass and send a notification via Bluetooth when this event is detected. We have narrowed our hardware selection to the Nordic Thingy:53™ as it integrates multiple sensors (including an accelerometer, gyroscope, microphone, and temperature sensor) onto a single board, which will simplify our data collection process. In addition, the Nordic Thingy:53™ has built-in Bluetooth Low Energy (BLE) connectivity, which will allow us to easily send notifications to nearby smartphones or other devices when our glass/window breaking detection system is triggered. The Nordic Thingy:53 is powered by the nRF5340 SoC, Nordic Semiconductor’s flagship dual-core wireless SoC that combines an Arm® Cortex®-M33 CPU with a state-of-the-art floating point unit (FPU) and Machine Learning (ML) accelerator. This will enable us to run our machine learning model locally on the Thingy:53, without needing to send data to the cloud for processing.
To build our machine learning model, we will be using the Edge Impulse platform. Edge Impulse is a Machine Learning platform that enables you to build custom models that can run on embedded devices, such as the Nordic Thingy:53™. With Edge Impulse, you can collect data from sensors, process this data using various types of Machine Learning algorithms (such as classification or regression), and then deploy your trained model onto your target device.
Edge Impulse has many benefits, the most useful being that you don't need extensive data to train a high-functioning AI model. You can also easily adjust the models based on various needs like processing power or energy consumption.
Android/iOS device
nRF Programmer Android/IoS App
Edge Impulse account
Git
Due to the fact that the Nordic Thingy:53 comes with a high quality MEMS microphone embedded on it, there is no wiring that must be done. Simply connect the development board to a power supply and move over to the next step.
Let's start by creating an Edge Impulse project. Select Developer as your project type, click Create a new project, and give it a memorable name.
New Thingy:53 devices will function with the Nordic nRF Edge Impulse iPhone and Android apps as well as the Edge Impulse Studio right out of the box.
Before connecting it to the Edge Impulse project, the firmware of the Thingy:53 must be updated. Download the nRF Programmer mobile application for Android or iOS and launch it. You will be prompted with a number of available samples.
Select the Edge Impulse application, select the version of the sample from the drop-down menu and tap Download.
When that is done, tap Install. A list with the nearby devices will appear and you must select your development board from the list. Once that is done, the upload process will begin.
With the firmware updated, connect the Thingy:53 board to a computer that has the edge-impulse-cli suite installed, turn it on, launch a terminal and run:
You will be required to provide your username and password before choosing the project to which you want to attach the device.
Once you select the project and the connection is successful, the board will show up in the Devices tab of your project.
For this particular use case, recording data containing glass breaking sounds is challenging. For such situations, Edge Impulse offers its users the possibility of uploading publicly available recordings of various phenomena that can be post-processed in the Data Acquisition tab.
We have gathered over 15 minutes of glass shattering sounds from various license-free SFX sound effects websites and we have uploaded them in our training data pool, using GlassBreaking as their Label. This can be done by navigating to the Upload data tab.
We also need audio for this application that doesn't contain the sound events that we want to identify. We must add sounds that belong to the "background sounds" category to the data pool, such as honks, people talking loudly, doors closing and other various background sounds that the system might be exposed to during normal use. The name of this class should be "BACKGROUND." When populating your dataset, keep in mind that the most crucial component of machine learning is data, and the richer and more varied your data set is, the better your model will perform.
Now that the data is available, it’s time to create the Impulse. The functional Block of the Edge Impulse ecosystem is called “Impulse” and it fundamentally describes a collection of blocks through which data flows, starting from the ingestion phase and up to outputting the features.
The setup is rather straightforward for this use case. We will be using a 1000ms window size, with a window increase of 200ms at an acquisition frequency of 100Hz. For the processing block we will be using an Audio (MFE) and for the Learning block, we will be employing a basic Classification (Keras).
When navigating to this menu, you will notice that in the top part of the screen you can explore the time domain representation of the data you have gathered.
Underneath, various parameters of the processing block may be modified. For the moment, we will be moving forward with the default values.
And finally, on the right side of the window you can observe the results of the Digital Signal Process and a spectrogram of the raw signal.
A good rule of thumb when tweaking the DSP block parameters is that similar signals should yield similar results.
Once you are happy with the results, click on Save parameters. After the "Generate features" page loads, click on Generate Features.
In the Generate features tab, you can observe the Feature explorer. It enables visual, intuitive data exploration. Before beginning to train the model, you can rapidly verify whether your data separates neatly. If you're looking to identify the outliers in your dataset, this feature is fantastic because it color-codes comparable data and enables you to track it back to the sample it originated from by just clicking on the data item.
The next step in developing our machine learning algorithm is configuring the NN classifier block. There are various parameters that can be changed: the Number of training cycles, the Learning rate, the Validation set size and to enable the Auto-balance dataset function. They allow you to control the number of epochs to train the NN on, how fast it learns and the percent of samples from the training dataset used for validation. Underneath, the architecture of the NN is described. For the moment, leave everything as is and press Start training.
The training will be assigned to a cluster and when the process ends, the training performance tab will be displayed. Here, you can evaluate the Accuracy and the Loss of your model, the right and wrong responses provided by our model after it was fed the previously acquired data set, in a tabulated form.
Moreover, you can see the Data explorer that offers an intuitive representation of the classification and underneath it, the predicted on-device performance of the NN.
You will notice that another menu pops up that allows you to opt in if you want to enable EON Compiler. We will get back to this later, for now click Build and wait for the process to end. Once it’s done, download the .hex file and follow the steps in the video that shows up to upload it on the Thingy:53 board.
With the impulse uploaded, connect the board to your computer, launch a terminal and issue the following command to see the results of the inferencing:
Another way of deploying the model on the edge is using the Nordic nRF Edge Impulse app for iPhone or Android:
Download and install the app for your Android/IoS device.
Launch it and login with your edgeimpulse.com credentials.
Select your project from the list
Navigate to the Devices tab and connect to the Thingy:53
Navigate to the Data tab and press connect. You will see the status on the button changing from Connect to Disconnect.
Navigate to the deployment tab and press deploy.
In the inferencing tab, you will see the results of the Edge Impulse model you have flashed on the device:
In this article, we have described how to create a glass/window breaking detection system using the Nordic Thingy:53™ development board and Edge Impulse Machine Learning platform. This system can be used in office buildings or other commercial settings to help improve safety and security. We believe that this approach has several advantages over existing solutions, including its low cost, ease of use, and accuracy. With further development, this system could be expanded to include other types of sensors (e.g. cameras) to improve detection accuracy or be used in other applications such as door/window opening detection or intruder detection.
Using a Nordic Thingy:53 with Keyword Spotting to turn an ordinary device into a smart appliance.
Created By: Zalmotek
Public Project Link:
https://studio.edgeimpulse.com/studio/145818
GitHub Repository:
https://github.com/Zalmotek/edge-impulse-appliance-control-voice-nordic-thingy53
In today's world, voice commands are becoming a popular user input method for various devices, including smart home appliances. While classical UI methods like physical buttons or a remote control will not be soon displaced, the convenience of using voice commands to control an appliance when multitasking or when having your hands busy with something else, like cooking, cannot be denied.
While very convenient in day to day use, using human speech as user input comes with a number of challenges that must be addressed.
First and foremost, using human speech as user input for smart appliances requires the recognition and understanding of natural language. This means that there must be some sort of keyword detection or voice recognition technology involved. As each user may have their own unique way of phrasing a request, like “ turn on the fan” or “start a 5 minute timer”, the voice recognition algorithms must be fine tuned to obtain the best accuracy.
Another big challenge when implementing such technologies is the security of the gathered data. Privacy concerns regarding speech recognition require cutting-edge encryption techniques to protect voice data and ensure the privacy of any sensitive personal or corporate information transmitted. An easy way to circumvent this is by employing IoT devices that run a machine learning algorithm on the edge and which do not store any data while running the detection algorithm.
We will be demonstrating the design and build process of a system dedicated to integrating basic voice control functionality in any device by using Nordic Thingy:53 dedicated hardware and an audio categorization model developed and optimized using the Edge Impulse platform.
The Nordic Thingy:53™ is an IoT prototyping platform that enables users to create prototypes and proofs of concept without the need for custom hardware. The Thingy:53 is built around the nRF5340 SoC, Nordic Semiconductor’s flagship dual-core wireless SoC. Its dual Arm Cortex-M33 processors provide ample processing power and memory size to run embedded machine learning (ML) models directly on the device with no constraints.
To build the machine learning model responsible for speech recognition we will be using the Edge Impulse platform. Between the plethora of advantages of using this platform, it’s worth mentioning that it does not require a lot of data to train a performant AI model and that it provides a great number of processing power and energy consumption optimization tools, allowing users to build models that can run even on resource restricted devices.
For this use case, we will be using the Nordic Thingy:53 as an advertising peripheral Bluetooth device that listens for the user input and then sends a message via BLE to the ESP32 connected to a relay that can switch on or off an appliance. This system architecture enables users to control multiple appliances distributed around the house, and only one “gateway” that runs the machine learning model on the edge.
Esp32 DevKit
Android/iOS device
220V to 5V power regulator
Plug-in plastic enclosure
Edge Impulse account
Edge Impulse CLI
Arduino IDE
Arduino CLI
Git
A working Zephyr environment
To control appliances you will either have to work with AC mains or integrate with some functionality that the appliance might have, such as IR control or switches.
We chose to connect the Adafruit Non-Latching Mini Relay FeatherWing to the ESP32 development board as presented in the following schematic. The Signal pin of the relay is connected to the GPIO32 pin of the ESP32 and the ground is common between the ESP32 and the relay.
You can use this circuit to control AC powered household devices, such as kettles, lights, or stove smoke extractors.
We have chosen an enclosure that will safely protect users from the AC Mains and the rest of the electronics. We have soldered the circuit on a test board by using the following schematic that we tested first on a breadboard. We have kept the testboard neatly separated between low DC voltage that was routed in the top part and AC High voltage that was routed in the lower part of the testboard.
Warning: Working with AC mains is dangerous if you have never done it before, please ask for an electronics senior or document yourself thoroughly before undergoing this schematic.
The enclosure provides an AC In and an AC Out socket plug that allows us to integrate our electronics between thus keeping the test boards continuously supplied and enabling or disabling the output thus turning on or off the supply.
Use the appropriate wire gauge when working with AC so the current draw of the appliance is met. We have used standard 16A wire gauge with the proper colors (blue for neutral, brown for line and yellow/green for grounding according to EU standards).
The gateway can be placed anywhere in the house and does not need to be connected to a power outlet as it is battery powered, making it much more convenient for users.
Building and flashing custom applications on the Nordic Thingy:53 board require a working Zephyr environment. To create it, follow the steps in the Getting Started guide from the official Zephyr documentation. Afterwards, follow the steps presented in the Developing with Thingy:53 guide from the official Nordic Semiconductor documentation. While this might not be mentioned in either of the documents, you must also install the J-Link Software and Documentation Pack and the nRF Command Line Tools(ver 10.15.4) to be able to flash the board.
After following the steps in the guides presented above, you should have a working Zephyr environment. Remember to always work in the virtual environment created during the Getting Started guide when developing applications for this platform.
Let's start by creating an Edge Impulse project. Select Developer as your project type, click Create a new project, and give it a significant name.
Thingy:53 devices will work with the Nordic nRF Edge Impulse iPhone and Android apps or with the Edge Impulse Studio right away.
First of all, the firmware of the Thingy:53 device must be updated. Download the nRF Programmer mobile application and launch it. You will be prompted with a number of available samples.
Select the Edge Impulse application, select the version of the sample from the drop-down menu and tap Download.
Once that is done, tap Install and a list with nearby devices will appear. You have to select your development board from the list and the upload process will begin.
With the firmware updated, connect the Thingy:53 board to a computer that has the edge-impulse-cli suite installed, turn it on, launch a terminal and run:
You will be required to provide your username and password before choosing the project to which you want to attach the device.
Once you select the project and the connection is successful, the board will show up in the Devices tab of your project.
Considering the context, the best way to gather relevant data for your Thingy:53 is to record yourself. To achieve the best performance of the speech recognition model, you can add samples of your own voice to increase the specificity of the detection algorithm.
Go to Data Acquisition in your Edge Impulse project and you can start gathering the data set.
For this particular use case, we will be using the "Light", "Kettle" and "Extractor" keywords to turn on different items in the kitchen. Now start recording 5-10 seconds segments of you saying "Light", "Kettle" and "Extractor". You will notice that they appear in the Collected data tab. Click on the menu symbolized by three points and press Split Sample. Edge Impulse automatically splits the sample in 1 second windows but you can adjust those manually. When you are happy with the windows, press Split.
We will also require audio that doesn't contain the keywords we wish to detect. We must gather a data set with sounds like ambient noise, people speaking in the distance, or different sound from the kitchen all of which fall within the "background" category. This class is identified as "Background". Keep in mind that data is the most important part of machine learning and your model will perform better the more diverse and abundant your data set is.
From now on, because "Light", "Kettle" and "Extractor" are the classes we wish to detect, we will be referring to them as "positive classes" and to "Background" as a negative class. What we will be doing is use the keywords dataset from the Unknown and Noise samples for background noises.
Download the dataset and navigate to the Upload data menu. We will be uploading all the samples to the Training category and we will label the "Noise" and "Unknown" samples with the label "Background".
Let's begin developing our Impulse now that the data are available. An Impulse describes a group of blocks through which data flows and can be viewed as the functional Block of the Edge Impulse ecosystem.
The input level, the signal processing level, the learning level, and the output level are the four levels that make up an impulse.
For the input block, we will leave the setting as default. As for the processing and learning block, we have opted for an Audio (MFCC) block and a basic Classification (Keras).
Do the set up just like in the image above and click on Save Impulse and move on to configure the blocks one by one.
The Audio MFCC (Mel Frequency Cepstral Coefficients) block extracts coefficients from an audio signal using Mel-scale, a non-linear scale. This block is used for human voice recognition, but can also perform well for some non-voice audio use cases. You can read more about how this block works here.
You can use the default values for configuring the MFCC block and click on Save parameters. You’ll be prompted to the feature generation page. Click on Generate features and you will be able to visualise them in the Feature explorer.
The next step in developing our machine learning algorithm is configuring the NN classifier block. Before training the model, you have the opportunity to modify the Number of training cycles, the Learning rate, the Validation set size and to enable the Auto-balance dataset function. The learning rate defines how quickly the NN learns, the number of training cycles determines the number of epochs to train the NN on, and the size of the validation set sets the percentage of samples from the training data pool utilized for validation. You can leave everything as it is for now and press Start training.
The training will be assigned to a cluster and when the process ends, the training performance tab will be displayed. There will be displayed the Accuracy and the Loss of the model, as well as the right and wrong responses provided by the model. You can also see an intuitive representation of the classification and underneath it, the predicted on-device performance of the NN.
One way of deploying the model on the edge is using the Nordic nRF Edge Impulse app for iPhone or Android:
Download and install the app for your Android/IoS device.
Launch it and log in with your edgeimpulse.com credentials.
Select your Smart Appliance Control Using Voice Commands project from the list
Navigate to the Devices tab and connect to the Thingy:53:
Navigate to the Data tab and press Connect. You will see the status on the button changing from Connect to Disconnect.
Navigate to the Deployment tab and press Deploy.
In the Inferencing tab, you will see the results of the Edge Impulse model you have flashed on the device:
To showcase the process of creating a custom application, we have decided to create a basic Bluetooth application. In this application, the Thingy:53 functions as a peripheral bluetooth device that advertises itself. The ESP32 functions as a bluetooth client that scans for available devices to connect to. When it detects the Thingy:53, it pairs with it and awaits a command.
After the devices are paired, when the central button of the Thingy:53 is pressed, it sends a message to the Esp32 which triggers a Relay.
Here you can find the source code for the Thingy:53 and the Esp32.
To build and flash the application on the Nordic hardware, copy the folder named Thingy53_Peripheral
from the repository, to ncs/nrf folder and then run:
Make sure you have the board powered on and connected via the J-Link mini Edu to your computer and run
To flash the esp32, follow the steps provided here to set-up the build environment and then, simply copy the code from the ESP32_Client.ino file in a new sketch and press upload.
For this project we have decided to deploy the machine learning algorithm on the Bluetooth peripheral device. This enables the possibility of using multiple Central devices that are dedicated to switching the appliance on and off, while the processing load of running the machine learning algorithm is done by the Thingy:53.
Another great addition to this project would be the implementation of the Matter protocol.
Matter is a royalty free standard and was created to encourage interoperability between different devices and platforms.
If the appliances already have an IoT layer, the Thingy:53 is fully compatible with the Matter and instead of using relays, it could be directly interfaced with the smart appliances.
In this article, we have presented a very basic implementation of Edge Impulse on the Thingy:53 to control an appliance using voice commands. The use of Edge Impulse and the integration with Nordic Semiconductor's IoT platform opens up endless possibilities for creating intelligent and user-friendly appliances.
The ability to quickly gather data, create, train and deploy machine learning algorithms greatly simplifies the process for developers, making it easier for them to incorporate these technologies into their projects.
Ultimately, this system provides a convenient and cost-effective way to control multiple appliances in the home.
We hope that this article will inspire you to try out Edge Impulse and Nordic Semi's Thingy:53 in your own smart appliance projects.
Use TinyML to listen for the sound of illegal logging, and send SMS notifications if a chainsaw is heard.
Created By: Zalmotek
Public Project Link:
https://studio.edgeimpulse.com/studio/126774
Illegal logging is a major environmental issue worldwide. It not only destroys forests but also decreases the amount of available timber for legal purposes. In addition, illegal logging often takes place in protected areas, which can damage ecosystems and jeopardize the safety of people and wildlife.
One way to combat this problem is through the use of machine learning algorithms that can detect chainsaw noise and be deployed on battery-powered devices, such as sensors in the forest. This allows for real-time monitoring of illegal logging activity, which can then be quickly addressed.
Detecting illegal logging is therefore essential for both environmental and economic reasons. However, it is difficult to detect illegal logging activity due to the vastness of forested areas, as it often takes place in remote and hard-to-reach areas. Traditional methods such as ground patrols are often ineffective, and satellite imagery can be expensive and time-consuming to analyze.
When combined with satellite data, ML can be used to quickly and cost-effectively identify areas where illegal logging is taking place. By deploying sensors in the forest, they are able to detect illegal logging activity in real time. In addition, by providing data on where illegal logging is taking place, they are helping to better target enforcement efforts. This information can then be used to deploy ground patrols or take other actions to stop illegal activity. In this way, ML can play a vital role in protecting forests and ensuring that timber is harvested legally.
Our approach to this problem is to create an IoT system based on the TinyML Syntiant development board that will run a machine learning model trained using the Edge Impulse platform that can detect the sound of chainsaws and send a notification via SMS when this event is detected.
We have picked the Syntiant TinyML board as our specialized silicon that can easily run the machine learning algorithm thanks to its Neural Decision Processors (NDPs). Moreover, having an onboard PH0641LM4H microphone makes it great for quickly prototyping audio-based projects. This board is ideal for building and deploying embedded Machine Learning models, as it is fully integrated with Edge Impulse and it has very low power consumption, advantageous for this use case as we do not want to charge the battery often.
Because this device must be deployed in remote and difficult-to-reach locations, a fitting power source must be used and so, we have opted to power it from a 2500 mAh rechargeable Li-Ion battery.
We will use a publicly available chainsaw noise dataset and the Edge Impulse platform to train and deploy a model that can distinguish between this sound and other similar sounds.
Micro USB cable
SIM800L GSM module
ESP32
ESP32 programmer
3.3V power supply (LD1117S33)
Power bank: rechargeable Li-Ion battery (INR18650-29E6 SAMSUNG SDI) + charger
Consumables (wires, prototyping board, LED)
Enclosure (carved stone)
Edge Impulse account
Edge Impulse CLI
Arduino IDE
Arduino CLI
Git
As stated before, our choice of Edge computing hardware for this use case is the Syntiant TinyML board, designed for working with Syntiant’s Neural Decision Processors (NDPs). This development board is a powerful tool for developing and deploying Machine Learning algorithms, as it is equipped with an ARM Cortex-M4 processor, which allows for real-time inference of the trained Machine Learning model. Another advantage of this Syntiant board is that it is equipped with 5 GPIOs that enable it to interact with other external circuitry and trigger an external output.
Over the ML detection layer of our system, we will be adding a communication layer. In this case, we are using an external circuit based on the ESP32 MCU that can read a trigger from Syntiant TinyML, which is further interfaced with a SIM800L module, responsible for sending an SMS message to the user when chainsaw noise is detected.
Be advised that the SIM800L is using 2G networks, in the EU there are still many providers for it and the coverage is way better for it in comparison with the speedier networks. Since we are deploying it in a forest it's actually a good option for us. In the US the 2G networks are being phased out currently so be sure to check what GPRS module works best in your case.
In order to camouflage the device, we can fit all of the electronics into a fake stone, making it less likely to be detected. This will make it difficult for illegal loggers to find and destroy the device, as they will not be able to see it.
Since we want to fit all the modules inside the rock, we’re using the smaller SMD ESP32-WROOM-32UE SoC instead of the ESP32 development board, so we also have to use a breakout board to program the board.
To power the system we’re using a power bank based on the rechargeable INR18650-29E6 Li-Ion battery, from which we can obtain both 5V for the Syntiant TinyML board and 3.8V for the SIM800L GSM module. We also have to use a 5V to 3.3V voltage regulator to power the ESP32. You can find all the details in the full wiring schematic presented below.
NOTE: The Syntiant TinyML board has no power voltage input (the 5V pin does not power up the board and by looking at the schematic we can see it's actually before a voltage regulator that draws from the USB power supply). We had to resort to soldering a small wire to a 5V test pad on the bottom of the board that luckily powers the board.
Because of the space constraints, we opted for a quick point-to-point wiring solution by directly soldering the components for our proof of concept. Of course, after further testing, a printed circuit board could be designed to accommodate all modules and provide a direct plug for the battery making it easier to produce in larger numbers in an automated fashion.
To connect the Syntiant TinyML to Edge Impulse, download the Audio firmware archive. Put the board in boot mode by double-clicking on the reset button when connecting the board to your computer, while the orange LED is blinking. In boot mode, you should see the red LED fading on and off. Then flash the firmware by running the script for your OS from the archive:
Let's start by creating an Edge Impulse project.
Register an account with Edge Impulse. Select Developer as your project type, click Create new project, and then give it a memorable name.
Afterward, select Accelerometer data as the type of data you will be dealing with.
Create a new Edge Impulse project, give it a meaningful name, and select Developer as your desired project type. Afterward, select Audio as the type of data you wish to use. To collect audio data from the microphone, connect the board to your computer. Once plugged in, the Syntiant TinyML Board shows up as a USB microphone and Edge Impulse can use this interface to record audio directly. Make sure the board is selected instead of your default microphone in your audio input settings.
Then, go to Devices -> Connect a new device in your Edge Impulse project, choose Use Your Computer, and allow access to your microphone.
Sometimes, gathering relevant data for your TinyML model might be a complicated endeavor, just like in this case. This is a situation in which the Upload data function under the Data acquisition tab saves the day. What it allows you to do is use publicly available recordings of the phenomena you wish to identify and use them as training data for your model.
In this case, what you must do is find publicly available recordings of chainsaws (in WAV format) and download them. Navigate to the Upload data menu, click on Choose Files and select your files, select Training as the upload category and manually enter the "Chainsaw" Label and click Begin upload.
It’s important to upload data that corresponds to background or unknown sounds because the NDP101 chip expects one and only negative class and it should be the last in the list, that we will be naming Z_Background.
For this, we will be using the Noise and Unknown dataset that can be found in the Keyword Dataset, which we will label Z_Background.
The next thing on the list is the data formatting. The data uploaded from the Keyword Dataset is split into 1-second long recordings, while the Chainsaw sounds are usually longer, over 30 seconds long. To make things uniform, we will select one Chainsaw data entry, click on the menu marked by the 3 vertical dots, and press on Split Sample. We will leave the segment length to 1000ms and add additional segments if needed. Click on Split and then do this for every chainsaw sample we have uploaded.
Once this is done, it’s time to do the Train/Test Split. Just click on the triangle with an exclamation mark on it, in the Train/Test Split section and Edge Impulse will do the job for you. An ideal ratio would be 80%-20%.
Once the Data acquisition is over, it’s time to build the Impulse.
You will notice the odd window size. Make sure to leave it at the default 968ms because this is a hardware constraint of the NDP101 chip.
For the processing block, we will be using Audio (Syntiant). This will compute log Mel-filterbank energy features from the audio signal that is fed in it and as for the learning block, we will be employing a Classification(Keras) block.
Once you are happy with the set-up, click on Save Impulse.
What this submenu does is let you explore the raw data, and see the signal processing block’s results.
Navigate to the "Syntiant" submenu under Impulse Design, leave the parameters on their default value and click on Save parameters, and then on Generate Features.
One of the most effective tools provided by Edge Impulse is the Feature Explorer. It enables visual, intuitive data exploration that enables you to rapidly verify whether your data separates neatly, even before starting to train your model. If you're looking to identify the outliers in your dataset, this feature is fantastic because it color-codes comparable data and enables you to track it back to the sample it originated from by just clicking on the data item.
In this stage, we may provide a number of parameters that affect how the neural network trains under the NN Classifier tab in the Impulse Design menu. The Training set can now be kept at its default setting. When you click the Start Training button, take note of how a processing cluster is chosen to host the training process.
What is worth noting here is the particular structure of the neural network. To be more precise, we need to use a 3-layer NN architecture, every layer consisting of 256 neurons to be able to run our model on the NDP101 chip.
Once the program is finished, you will see the training output displayed. A degree of accuracy of more than 95% is what we aim for. The accurate and incorrect responses that our model provided after being fed the previously gathered data set are shown in tabular form in the Confusion matrix just below it. In this case, the dataset was clean enough that the model exhibits a 100% Accuracy rating.
Navigating to the Model Testing page is a wonderful method to start testing our model before deployment. The samples kept in the Testing data pool will be displayed to you. To run all of this data through your impulse, select Classify all.
Before making the effort to deploy the model back on the edge, the user has the opportunity to test and improve the model using the model testing tab. When building an edge computing application, the ability to go back and add training data, modify the DSP and Learning block, and fine-tune the model reduces development time significantly. We strongly recommend taking the time to go back, tweak and optimize your model before deploying it on the edge.
To check how the model behaves when deployed on the edge, navigate to the Deploy tab, click on Syntiant TinyML under the Build firmware section, select the posterior parameters as the audio events we wish to detect, and press build.
Edge Impulse will then compile a pre-built binary that can be deployed in the same manner as the data-forwarder used in the data acquisition phase.
With the model uploaded on the board, you can use any serial monitor, like picocom or the serial monitor embedded in Arduino IDE to debug and evaluate the performance of the Impulse.
Once you are happy with your model, it's time to deploy it on the device.
For our use case, we will be using our custom machine learning model alongside the firmware provided by Edge Impulse to create a custom application that will trigger an output when Chainsaw sounds are detected.
First and foremost, navigate to the Deployment tab, select Syntiant NDP101 library, and then click on "Find posterior parameters". This is a particularity of the NDP101 architecture that allows the user to set the thresholds for words or other audio events at which a model activates.
Select "Chainsaw" as the audio event that you want to detect and press Find parameters. Once that is done, you are ready to press Build.
The task will be assigned to a computing cluster and when the job finishes, you will be able to download it locally.
Afterward, clone the Syntiant firmware repository and copy and paste the model files into the "src" folder.
Open the "firmware-syntiant-tinyml.ino" in Arduino IDE, without creating a folder for it, and modify the on_classification_changed function like so:
Save and close.
Next up, launch a terminal, navigate to the firmware-syntiant-tinyml folder and run:
Once the firmware successfully builds, connect the Syntiant board to your computer, place it in Boot mode and run:
If everything goes smoothly, the firmware will be flashed on your Syntiant TinyML board and when your board detects Chainsaw sounds, it will light up the LED in the color red.
Now that you have a Syntiant TinyML board that is able to detect chainsaw noises, you can program the ESP32 board to send an SMS notification through the SIM800L GSM module whenever the sound is detected. The GSM module is connected to the UART2 of the ESP32 microcontroller. Open Arduino IDE and paste the code below, then adjust the phone number and upload it to your ESP32 board:
In conclusion, illegal logging is a major environmental issue that can be effectively combated through the use of machine learning algorithms. The system described in this article provides a way to quickly and easily monitor large areas of forest for illegal logging activity, and take action to stop it. The Syntiant TinyML board is an efficient and robust platform for running machine learning models and can be used to quickly and easily detect illegal logging activity. With this system in place, we can help to preserve our forests and ensure that they are managed in a sustainable way.
If you need assistance in deploying your own solutions or more information about the tutorial above please reach out to us!
An audio classification project that can identify snoring, deployed to a smartphone.
Created By: Wamiq Raza
Public Project Link: https://studio.edgeimpulse.com/public/109559/latest
Snoring, a type of sleep disordered breathing, disrupts sleep quality and quantity for both the snorer and, frequently, the person who sleeps with the snorer. Snoring-related sleep deprivation can cause serious physical, emotional, and financial difficulties. Snoring not only disrupts the snorer's sleep, but it can also cause anger between spouses! About 40% of adult males and 24% of adult women snore on a regular basis. Snoring begins when the muscles around the throat relax during sleeping. This narrows the airway, causing vibrations that result in snoring. Snoring is more common when a person sleeps on their back. Sleeping on your side is a natural remedy for snoring. Sleeping on your side rather than your back is a simple and natural treatment for snoring. In this project, a deep learning model for snoring detection is designed to be implemented on a smart phone using the Edge Impulse API, and the model may be put on other embedded systems to detect snoring automatically. A smart phone is linked to the listener module by home Wi-Fi or mobile data to log snoring incidents with timestamps, and the data may be shared to a physician for treatment and monitoring of disorders such as sleep apnea.
This tutorial has the following requirements:
Basic understanding of software development
Edge Impulse account
Android or iOS mobile phone
We will use Edge Impulse, an online development platform for machine learning on edge devices. Create a free account by signing up here. Log into your account and give your new project a name by clicking on the title. To run the current project as-is, you can directly clone it to make your own copy, and start executing.
This project used a dataset of 1000 sound samples which is divided into two categories: snoring noises and non-snoring sounds. There are 500 examples in each class, the snoring noises were gathered from several web sources and non-snoring sound were gathered from similar web sources. The files were then separated into equal sized one second duration files after silences were removed from them. As a result, each sample lasts one second. Among the snoring samples, 363 samples consist of snoring sounds of children, adult men and adult women without any background sound [1]. The remaining samples having a background of non-snoring sounds. The 500 non-snoring samples consist of background sounds that are ten categories such as baby crying, the clock ticking, the door opened and closed, total silence and the minor sound of the vibration motor of the gadget, toilet flashing, siren of emergency vehicles, rain and thunderstorms, streetcar sounds, people talking, and background television news. Figure 1 and Figure 2 illustrate the frequency of snoring and non-snoring sound respectively. The dataset can be downloaded from [2].
Once the dataset is ready you can upload it into Edge Impulse. Figure 3 represents the Edge Impulse platform, and how to upload the data. If you don’t want to download or upload data, and just run the project, refer to the section Getting started for cloning the current project.
Next, we will select signal processing and machine learning blocks, on the Create impulse page. The impulse will start out blank, with Raw data and Output feature blocks. Leave the default settings of a 1000 ms Window size and 1000 ms Window increase as shown in Figure 4. This means our audio data will be processed 1 second at a time, starting each 1 second. Using a small window saves memory on the embedded device.
Click on ‘Add a processing block’ and select the Audio (MFE) block. Next click on ‘Add a learning block’ and select the Neural Network (Keras) block. Click on ‘Save Impulse’ illustrated in Figure 5. The audio block will extract a spectrogram for each window of audio, and the neural network block will be trained to classify the spectrogram as either a 'snoring' or 'no snoring' based on our training dataset. Your resulting impulse will look like this:
Next we will generate features from the training dataset on the MFE page, as shown in Figure 6. This page shows what the extracted spectrogram looks like for each 1 second window from any of the dataset samples. We can leave the parameters on their defaults.
Next click on the ‘Generate features’ button, which then processes the entire training dataset with this processing block. This creates the complete set of features that will be used to train our Neural Network in the next step. Press the ‘Generate features’ button to start the processing, this will take a couple minutes to complete, and the result of the generated features can be seen in Figure 7.
We can now proceed to setup and train our neural network on the NN Classifier page. The default neural network works well for continuous sound. Snoring detection is more complicated, so we will configure a richer network using 2D convolution across the spectrogram of each window. 2D convolution processes the audio spectrogram in a similar way to image classification. Refer to the "NN classifier" section in my project for the architecture structure.
To train the model, the number of epochs was set to 100, the leaning rate assigned after several trials is 0.005, and the overall dataset was split into 80% training and 20% validation set. The number of epochs is the number of times the entire dataset is passed through the neural network during training. There is no ideal number for this, and it depends on the data. In Figure 8 we can see the feature explorer for correct and incorrect classification of both classes.
The model confusion matrix and on mobile device performance can be seen in Figure 9. The overall accuracy of the quantized int8 model is 94.3%, with 93.5% 'no snoring' and '95.1%' snoring correct classification, and the rest are misclassified.
After the training model we run the model for testing purposes. It gives us an accuracy of 97.42% on test data, as presented in Figure 10, along with feature exploration.
The Live classification page allows us to test the algorithm both with the existing testing data that came with the dataset, or by streaming audio data from your mobile phone or on any microcontroller with audio data processing compatibility. We can start with a simple test by choosing any of the test samples and pressing ‘Load sample’. This will classify the test sample and show the results: in Figure 11 and 12 respectively for both classes with their probability score. We can also test the algorithm with live data. Start with your mobile phone by refreshing the browser page on your phone. Then select your device in the ‘Classify new data’ section and press ‘Start sampling’ as shown in Figure 15.
In order to deploy the model on a smartphone, go to the browser window on your phone and refresh, then press the ‘Switch to classification mode’ button as shown in Figure 13. This will automatically build the project into a Web Assembly package and execute it on your phone continuously (no cloud required after that, you can even go to airplane mode!). Once you scan the QR code on your mobile it will ask you to access the microphone, as in this project we are using audio data, as shown in Figure 14. After the access is granted, we can see the classification results.
Furthermore, if you would like to extend the project, you can run it on different microcontrollers. Of course, for this project, our aim was to provide the insight of TinyML development on a smartphone.
This project provides insights of TinyML deployment on a smartphone. In this project, a deep learning model for snoring detection is trained, validated, and tested. A prototype system comprising of a listener module for snoring detection was demonstrated. For future work, extend the default dataset with your own data and background sounds, remember to retrain periodically, and test. You can set up unit tests under the Testing page to ensure that the model is still working as it is extended.
T. H. Khan, "A deep learning model for snoring detection and vibration notification using a smart wearable gadget," Electronics, vol. 8, no. 9, article. 987, ISSN 2079-9292, 2019
Khan, T. A Deep Learning Model for Snoring Detection and Vibration Notification Using a Smart Wearable Gadget. Electronics 2019, 8, 987. https://doi.org/10.3390/electronics8090987
Using an Arduino Portenta H7 to listen for and classify the flow of water in a pipe.
Created By: Manivannan Sivan
Public Project Link:
Water is the world's most precious resource, yet it is also the one which is almost universally mismanaged. As a result, water shortages are becoming ever more common. In the case of water supply and distribution networks, these manifest themselves in the intermittent operation of the system. Not only is this detrimental to the structural condition of the pipes, but can also adversely affect the quality of the water delivered to the customer's taps. Further, leakage often exceeds 50% of the production. Not only does this have a significant economic impact, but an environmental one too. But to recover leakage has a cost to undertake a hydraulic study of the network, create a permanent monitoring system, and eliminate the leaks. So how low should leakage go and how can a lower leakage level be maintained over time? This is the objective of the very innovative EU funded PALM project recently completed in central Italy.
There is an increased carbon footprint of having pumps constantly running to make up for the water lost due to leakage. It is the increased pump use, and pump maintenance/replacement costs that increase CO2 in the air from the fossil fuels being burned to support it. According to a study done by Von Sacken in 2001, water utilities are the largest user of electricity accounting for 3% of the total electricity consumption in the US. In addition, it is estimated that 2-3 billion kW/h of electricity is expended pumping water due to leakage.
Costs, health, the environment, and infrastructure are just a few things that can come into play when water system leakage goes uncorrected.
More than 2 billion people globally live in countries with high water stress, per the 2018 statistics provided by the United Nations (UN). In order to tackle this problem, it is necessary to conserve and utilize water safely. Installation of proper water pipeline leak detection systems assist in specifying the leakages in installed water pipes, which ultimately avoids wasting water through cracks and holes. Therefore, the increasing scarcity of water is propelling the demand for water leak solutions, which in turn drives the market.
The global water pipeline leak detection systems market size is expected to reach $2,349.6 million in 2027, from $1,748.6 million in 2019, growing at a CAGR of 6.8% from 2020 to 2027. Water pipeline leak detection systems are utilized to determine the location of the leak in water transmission pipelines. Around 30% to 50% of water is lost through aging pipelines, which also contributes toward loss of revenue. Water pipeline leak detection systems are available for both underground and overground water pipelines to precisely locate and check the severity of pipeline leaks.
On the contrary, in recent years, pipeline leak detection systems have undergone various technological advancements by adoption of computerized systems and digital survey systems. The traditional acoustic detection sensors are upgraded with more efficient sound detection functions which has increased their efficiency. Introduction and implementation of such advanced technologies are likely to create lucrative opportunities for the growth of the water pipeline leak detection systems market during the forecast period.
In recent years, the increase in acoustic-based pipe leakage detection has started increasing due to investment in R&D.
In this fast growing sector, TinyML-based systems will play a major role due to low power consumption and developing EdgeML models with more accuracy in predicting leakage detection.
My prototype is based on acoustic data collected on an Arduino Portenta H7 and a model is trained using Edge Impulse. In my prototype, The Arduino Portenta Vision Shield is used because it contains two microphones (MP34DT05) which runs on 16 MHz. The Vision Shield is placed on top of the pipe for data acquisition as the microphone faces the pipe. This will help to collect the noise of the water flowing.
In the data acquisition stage, the pipe is simulated with "Idle" mode, where the tap is fully closed so no water flows, and then slightly opened to simulate "leakage mode". Finally, is it fully opened to simulate "water flow" mode.
In a pre processing stage, the Window size is set as 2000ms and Window increase is set as 500ms.
For Neural Network configuration, I have used couple of 1D-Conv layers followed by DNN layers.
The number of Training cycles is set to 100 and Learning rate is set to 0.005. The accuracy obtained was 99.1 % with loss of 0.02 only. As the model is performing well at classifying the data, we can move on.
In Model testing, the trained model is tested with data and it is able to predict all 3 conditions we trained on with 100% accuracy.
Then in Deployment section, select Arduino Portenta H7 and download the firmware files to your computer.
Press the Reset button twice on the Portenta to change it to Flash mode. Then run the .bat file if you are on Windows, or the Mac or Linux commands if you are on those platforms.
The prototype demonstrated an acoustic method to predict leakage in a pipe. The model was able to determine whether the pipe is in Idle (no water flowing), flowing normally, or if there is a small flow, representing leakage in this case. The use case is simple enough to apply to any industry to monitor the leakages in pipe, though this is of course only a prototype project.
By integrating well-designed enclosures with higher quality microphones, the Arduino Portenta H7 will be ideal for industrial use-cases for pipe leakage detection.
An exploration into using machine learning to better monitor a patient coughing, to improve medical outcomes.
Created By: Eivind Holt
Public Project Link:
GitHub Repository:
This wearable device detects and reports user's coughs. This can be useful in treatment of patients suffering from chronic obstructive pulmonary disease, COPD, a group of diseases that cause airflow blockage and breathing-related problems. The increase in number and intensity of coughs can indicate ineffective treatment. Real-time monitoring enables caregivers to intervene at an early stage.
Existing methods of analyzing audio recordings greatly invades privacy of the patient, caregivers and peers. This proof-of-concept does not store any audio for more than a fraction of a second. This audio buffer never leaves the device, it is constantly being overwritten as soon as the application has determined if the small fragment of audio contains a cough or not. In fact, the hardware used is not capable of streaming audio using the low-energy network in question.
Further, the application is hard-coded to detect coughs or noise. To be able to detect new keywords, for instance "bomb", or "shopping", the device would have to be physically reprogrammed. Firmware Over-the-Air is not currently supported in this project. Each keyword consumes already constrained memory, limiting the practical amount of different keywords to a handful.
Compared to commercial voice assistants, such as Google Nest, Amazon Alexa or Apple Siri on dedicated devices or on smartphone, this device works a bit differently. The aforementioned products are split into two modes: activation and interpretation. Activation runs continuously locally on the device and is limited to recognizing "Hey google" etc. This puts the device in the next mode, interpretation. In this mode an audio recording is made and transmitted to servers to be processed. This opens up for greatly improved speech recognition. It also opens up to secondary use, better know as targeted advertisement. The device in this project only works in the activation mode.
Arduino Nano 33 BLE Sense
LiPo battery
JSH battery connectors
Edge Impulse Studio
VS Code/Arduino IDE
Nordic Semiconductor nRF Cloud
nRF Bluetooth Low Energy sniffer
Nordic nRF52840 Dongle
Fusion 360
3D printer
Qoitech Otii Arc
A model was trained using 394 labeled audio samples of intense coughs, a total of 2 minutes and 34 seconds. An almost equal amount of audio samples of less intense coughs, sneezes, clearing of throat, speech and general sounds was also labeled, 253 samples, 2 minutes and 38 seconds. All samples were captured using the Arduino Nano, positioned at the intended spot for wear.
My coughs lasts around 200 milliseconds. I sampled 10 seconds of repeated coughing with short pauses, then split and trimmed the samples to remove silence.
The model was tested using data set aside and yielded great results. I used EON Tuner in Edge Impulse Studio to find optimal parameters for accuracy, performance and memory consumption.
An Arduino compatible library was built and used to perform continuous interference on an Arduino Nano 33 BLE Sense audio input.
I followed some samples on how to use the generated Arduino libraries from Edge Impulse and how to perform inference on the audio input. If attempting to build my source code, make sure to include the /lib folder. I had to experiment a bit with parameters on the length of the audio window and slices. As each audio sample might start and end in any number of places for a given cough, each piece of audio is analyzed several times, preceeding and following adjoining samples. The results of the inference, the classification, is checked and triggers cough increment if probability is above 50%. A LED is flashed as an indicator.
In short my application defines a custom BLE service, with a characteristic of type unsigned integer, with behaviors Notify, Read and Broadcast. Not very sophisticated, but enough for demonstration. Any connected device will be able to subscribe to updates on the value.
Next I used the nRF Android app on my phone as a gateway between the device and nRF Cloud.
I used lithium polymer batteries for compact size and ease of recharging. I only had spare 500 mAh batteries available, shipping options for assorted batteries by air is limited. To extend battery life I connected two in parallel by soldering 3 JSH female connectors. Warning: This wiring is subjectable to short circuit and is only connected under supervision. This gives twice the capacity while keeping the voltage at the same level.
I made the mistake of assuming I would have to connect more components to the Arduino Nano via a protoboard. On a whim I ordered the Nano with pre-soldered headers. This only took up space and I had to undergo the tedious work of removing the headers by hand using a regular soldering iron. Sacrificing the headers by snipping them every other pin greatly eased the required finger acrobatics.
The only other thing I did was solder a female JSH battery connector to pins VIN and GND. This would serve as my battery connection, and subsequently the device's on/off toggle.
I wanted to make a prototype for demonstrating the concept for clinicians. It needed to contain and protect the electronics and batteries, while allowing sound waves to reach the microphone. I realized this would complicate making the enclosure watertight and quickly crossed that off the list. I also wanted to make a practical mechanism for securing the device on the wearer.
I used Autodesk Fusion 360 as CAD to design the enclosure. I always start by making rough digital replications of the hardware using calipers to take measurements on a sketch.
This gives the driving constraints and allows me to experiment with different hardware layouts without having to totally scrap alternatives.
While designing I constantly need to take into consideration the manufacturing method, in this case a resin-based SLA 3D printer. When drawing I have to decide on the orientation of the model during printing to avoid complicating overhangs, cupping, hard-to-reach surfaces for removing support material. I also want to reduce the number of parts, to avoid unnecessary post-print work and bad fits.
Completed prints undergo an IPA wash to remove excess resin and finally post cure in a UV-light chamber. What remains is to snip off support material, sand any uneven surfaces and glue together parts. Now the device could finally be assembled and tested.
I ended up with a sort of a badge with a clip and a friction fit lid. It reminds me of a 1960's Star Trek communicator, not the worst thing to be compared against.
Battery life is limited to a few days. I am in the process of reimplementing the device using a Neural Decision Processor, NDP, that is able to perform the inference with a fraction of the energy a conventional MCU requires.
I tried to limit audio inference to only perform when the Arduino Nano accelerometer triggers due to some amount of movement (chest movement during a cough). I was disappointed to discover that the interrupt pin on the LSM9DS1 IMU is not connected to the MCU.
You might also have realized that the device will pick up coughs by bystanders, something I discovered when demonstrating the device to a large audience during a conference! Limiting activation to both movement and audio will sort this out.
When demonstrating the device to doctors and nurses I received a great suggestion. A COPD patient that stops taking their daily walk is a great source for concern. My device could be extended to perform monitoring of physical activity using accelerometer data and report aggregated daily activity.
It might be useful to support simple keywords so a patient could log events such as blood in cough, types of pain, self-medication etc.
I plan to move from BLE to LoRaWAN or NB-IoT for transmissions. This way patients won't have to worry about IT administration or infrastructure, it will just work. Please see my other projects at Hackster and element 14 for demonstrations of these lpwan technologies.
I have had the opportunity to demonstrate the device to clinicians both in person and at expositions and it has received praise, suggestions for further features and use in additional conditions. This project has also spawned several other ideas for wearables in e-health.
Identify a location by using sound and audio analysis, on a Syntiant TinyML board.
Created By: Swapnil Verma
Public Project Link:
Many times in an application, we want to identify the environment of a user, in order to automatically carry out certain tasks. For example, automatically switching on/off ANC (Automatic Noise Cancellation) in a headphone based on the user's environment. Or, in some applications leaving GPS turned on might be overkill and drain a battery's charge.
In situation's like this, can we detect a user's local environment without the use of GPS?
We, humans, are good at understanding our environment just by using auditory sensory skills. I can identify a multitude of sound sources and also guess an environment just by listening. In this project, I am trying to replicate the same behaviour using a TinyML board with a microphone, running a machine-learning model.
As you might have guessed already, I am using Edge Impulse for building a TinyML model. Let's explore our training pipeline.
A good machine learning model starts with a high-quality dataset. I am using the [ESC-50] public dataset, to prepare my own dataset.
I have prepared a total of 7 classes denoting various locations, by combining some of the classes from the ESC-50 dataset. The classes I have prepared are:
Airport (aeroplane and helicopter sound)
Bathroom (brushing and toilet flush sound)
Construction (jackhammer sound)
Home (washing machine and vacuum cleaner sound)
Road (siren, car horn and engine sound)
Work (mouse click and keyboard typing sound)
Anomaly (quiet environment sound)
I have used only the ESC-50 dataset to prepare a new dataset for this project. The sound samples contained within any of these classes are not the only representation of that class. They can be improved by adding more sounds from different sources.
The first thing to do for training an ML model is Impulse design. It can also be thought of as a pipeline design from preprocessing to training.
In the pre-processing block, start with the default parameters. If required then change the parameters to suit your needs. In this project, the default parameters worked perfectly, so I just used them.
After adjusting the parameters, click on Generate features.
My model, trained with 1000 epochs and 0.0002 learning rate has 89.2% accuracy, which is not bad. This tab also shows the confusion matrix, which is one of the most useful tools to evaluate a model. This confusion matrix shows that the work class is the worst-performing class in our dataset.
The pre-processing and NN classifier tab automatically adjusted itself for 6 classes instead, after the dataset was modified.
This will re-run the pre-processing block and learning block in one click, with the latest parameters.
After retraining the model with these parameters, the training accuracy is now 92.1%.
Thee Road class has the worst performance. Let's investigate why. Scroll to a sample which was classified incorrectly, and click on the 3 dots to get a menu, and then click on Show classification.
This will show the result of each window of that sample. In the Raw data section, we can also play the audio corresponding to that window, giving us more insight into why the sample was incorrectly classified.
Let's test the model after deploying it on the Syntiant TinyML board. You can check it out in the below video.
To download a protective 3D-printable case for the Syntiant TinyML board, please follow the below link:
To clone this Edge Impulse project please follow the below link:
Keyword recognition and notification for AI patient assistance With Edge Impulse and the Arduino Nano 33 BLE Sense.
Created By:
Public Project Link:
When hospitals are busy it may not always be possible for staff to be close when help is needed, especially if the hospital is short staffed. To ensure that patients are looked after promptly, hospital staff need a way to be alerted when a patient is in discomfort or needs attention from a doctor or nurse.
A well known field of Artificial Intelligence is voice recognition. These machine learning and deep learning models are trained to recognize phrases or keywords, and combined with the Internet of Things can create fully autonomous systems that require no human interaction to operate.
As technology has advanced, it is now possible to run voice recognition solutions on low cost, resource constrained devices. This not only reduces costs considerably, but also opens up more possibilities for innovation. The purpose of this project is to show how a machine learning model can be deployed to a low cost IoT device (Arduino Nano 33 BLE SENSE), and used to notify staff when a patient needs their help.
The device will be able to detect three keywords Doctor, Nurse, and Help. The device also acts as a BLE peripheral, BLE centrals/masters such as a central server for example, could connect and listen for data coming from the device. The server could then process the incoming data and send a message to hospital staff or sound an alarm.
Your first step is to create a new project. From the project selection/creation you can create a new project.
Enter a project name, select Developer and click Create new project.
We are going to be creating a voice recognition system, so now we need to select Audio as the project type.
Once the dependencies are installed, connect your device to your computer and press the RESET button twice to enter into bootloader mode, the yellow LED should now be pulsating.
Once the firmware has been flashed you should see the output above, hit enter to close command prompt/terminal.
Open a new command prompt/terminal, and enter the following command:
edge-impulse-daemon
If you are already connected to an Edge Impulse project, use the following command:
edge-impulse-daemon --clean
Follow the instructions to log in to your Edge Impulse account.
Once complete head over to the devices tab of your project and you should see the connected device.
We are going to create our own dataset, using the built in microphone on the Arduino Nano 33 BLE Sense. We are going to collect data that will allow us to train a machine learning model that can detect the words/phrases Doctor, Nurse, and Help.
We will use the Record new data feature on Edge Impulse to record 15 sets of 10 utterances of each of our keywords, and then we will split them into individual samples.
Ensuring your device is connected to the Edge Impulse platform, head over to the Data Acquisition tab to continue.
In the Record new data, make sure you have selected your Arduino Nano 33 BLE Sense, then select Built in microphone, set the label as Doctor, change the sample length to 20000 (20 seconds), and leave all the other settings as.
Here we are going to record the data for the word Doctor. Make sure the microphone is close to you, click Start sampling and record yourself saying Doctor ten times.
You will now see the uploaded data in the Collected data window, next we need to split the data into ten individual samples.
Click on the dots to the right of the sample and click on Split sample, this will bring up the sample split tool. Here you can move the windows until each of your samples are safely in a window. You can fine tune the splits by dragging the windows until you are happy, then click on Split
You will see all of your samples now populated in the Collected data window. Now you need to repeat this action 14 more times for the Doctor class, resulting in 150 samples for the Doctor class. Once you have finished, repeat this for the remaining classes: Nurse and Help. You will end up with a dataset of 450 samples, 150 per class.
Now we have all of our main classes complete, but we still need a little more data. We need a Noise class that will help our model determine when nothing is being said, and we need an Unknown class, for things that our model may come up against that are not in the dataset.
For the noise class we will mix silent samples, and some other general noise samples. First of all record 100 samples with no speaking and store them in an Noise class.
We need to split the dataset into test and training samples. To do this head to the dashboard and scroll to the bottom of the page, then click on the Perform train/test split
Once you have done this, head back to the data acquisition tab and you will see that your data has been split.
Now we are going to create our network and train our model.
Head to the Create Impulse tab and change the window size to 2000ms. Next click Add processing block and select Audio (MFCC), then click Add learning block and select Classification (Keras).
Now click Save impulse.
Head over to the MFCC tab and click on the Save parameters button to save the MFCC block parameters.
If you are not automatically redirected to the Generate features tab, click on the MFCC tab and then click on Generate features and finally click on the Generate features button.
Your data should be nicely clustered and there should be as little mixing of the classes as possible. You should inspect the clusters and look for any data that is clustered incorrectly (You don't need to worry so much about the noise and unknown classes being mixed). If you find any data out of place, you can relabel or remove it. If you make any changes click Generate features again.
Now we are going to train our model. Click on the NN CLassifier tab then click Auto-balance dataset, Data augmentation and then Start training.
Once training has completed, you will see the results displayed at the bottom of the page. Here we see that we have 99.2% accuracy. Lets test our model and see how it works on our test data.
Head over to the Model testing tab where you will see all of the unseen test data available. Click on the Classify all and sit back as we test our model.
You will see the output of the testing in the output window, and once testing is complete you will see the results. In our case we can see that we have achieved 96.62% accuracy on the unseen data, and a high F-Score on all classes.
Now we need to test how the model works on our device. Use the Live classification feature to record some samples for classification. Your model should correctly identify the class for each sample.
Edge Impulse has a great new feature called Performance Calibration, or PerfCal. This feature allows you to run a test on your model and see how well it will perform in the real world. The system will create a set of post processing configurations for you to choose from. These configurations help to minimize either false activations or false rejections
Once you turn on perfcal, you will see a new tab in the menu called Performance calibration. Navigate to the perfcal page and you will be met with some configuration options.
Select the Noise class from the drop down, and check the Unknown class in the list of classes below, then click Run test and wait for the test to complete.
The system will provide a number of configs for you to choose from. Choose the one that best suits your needs and click Save selected config. This config will be deployed to your device once you download and install the library on your device.
We can use the versioning feature to save a copy of the existing network. To do so head over to the Versioning tab and click on the Create first version button.
This will create a snapshot of your existing model that we can come back to at any time.
Now we will deploy an Arduino library to our device that will allow us to run the model directly on our Arduino Nano 33 BLE Sense.
Head to the deployment tab and select Arduino Library then scroll to the bottom and click Build.
Note that the EON Compiler is selected by default which will reduce the amount of memory required for our model.
Once the library is built, you will be able to download it to a location of your choice.
Once you have downloaded the library, open up Arduino IDE, click Sketch -> Include library -> Upload .ZIP library..., navigate to the location of your library, upload it and then restart the IDE.
Open the IDE again and go to File -> Examples, scroll to the bottom of the list, go to AI_Patient_Assistance_inferencing -> nano_ble33_sense -> nano_ble33_sense_microphone.
Once the script is uploaded, open up serial monitor and you will see the output from the program. The green LED on your device will turn on when it is recording, and off when recording has ended.
Now you can test your program by saying any of the keywords when the green light is on. If a keyword is detected the red LED will turn on.
Now open AI_Patient_Assistance_inferencing -> nano_ble33_sense -> nano_ble33_sense_microphone_continuous, copy the contents of libraries/ai_patient_assistance/ai_patient_assistance_continuous.ino into the file and upload to your board.
Once the script is uploaded, open up serial monitor and you will see the output from the program. The red LED will blink when a classification is made.
This program acts as a BLE peripheral which basically advertises itself and waits for a central to connect to it before pushing data to it. In this case our central/master is a smart phone, but in the real world this would be a BLE enabled server that would be able to interact with a database, send SMS, or forward messages to other devices/applications using a machine to machine communication protocol such as MQTT.
When your BLE app connects to the program, the LED light will turn blue, once the app disconnects the LED will turn off.
Here we have created a simple but effective solution for detecting specific keywords that can be part of a larger automated patient assistance system. Using a fairly small dataset we have shown how the Edge Impulse platform is a useful tool in quickly creating and deploying deep learning models on edge devices.
You can train a network with your own keywords, or build off the model and training data provided in this tutorial. Ways to further improve the existing model could be:
Record more samples for training
Record samples from multiple people
For initial setup of the Portenta, follow the steps outlined here:
Using cough frequency and intensity as and indicator of COPD condition has outside hospital wards using existing technology. The main traditional approach consists of audio recording a patient at ward, then using manual or software based cough counting. While developing this new proof-of-concept no similar approaches were found.
A model was trained to distinguish intense coughs from other sounds. An Arduino Nano 33 BLE Sense was programmed to continuously feed microphone audio into an application. The application then runs inference on small audio fragments to determine the probability of this fragment containing a cough. If it does, a counter is incremented and this is securely advertised using Bluetooth Low-Energy, BLE. An other BLE device, such as a smartphone or a USB-dongle, can be paired with the device and re-transmit the event to a service on the internet. I have used for this purpose. nRF Cloud exposes several APIs (REST web services, MQTT brokers) that enables the events to be integrated with other systems. With this as a basis it's possible to transform the event into an internationally clinically recognizable message that can be routed into an Electronic Medical Record system, EMR. Popular standards include openEHR and HL7 FHIR.
is the leading development platform for machine learning on edge devices and it's free for use by developers. The is some of the best I have experienced in my two decades as a professional developer. I also wish to recommend the book as a practical, project based introduction to TinyML and Edge Impulse. I will highlight some particulars of my application. You may explore my .
I am the only source of the coughs, if this is to be used by anyone else a significantly larger and more diverse dataset is needed. I have found several crowd-sourced , thanks to efforts during the covid pandemic. I started this project by making Python scripts that would filter, massage and convert these samples. The quality of many of the samples were not suitable for my project, many 10-second samples only contained a single, weak cough. Very few were accurately labeled. Due to the amount of work required to trim all of these and to manually label each sample, I decided to produce my own. Labeled datasets of are also readily available.
The samples were split for training (81%) and 19% were put aside for testing. The training samples were used to extract audio features and create a neural network classifier. The NN architecture and parameters were mainly the result of experimentation. and books gave conflicting and outdated advice, but were still useful for understanding the different steps. Edge Impulse Studio is perfect for this type of iterative experimentation, as it replaces a lot of custom tooling with beautiful UI.
The Arduino ecosystem is wonderful for this kind of explorative prototyping. using Visual Studio Code or Arduino IDE/web IDE was a breeze and access to e.g. BLE-APIs was intuitive. You may explore the .
If you are used to RPC or even REST types of communication paradigms, BLE will require a bit of reading and experimentation. The give great explanation of key concepts and sample code to get started. Nordic also have .
I used a in conjunction with nRF Connect for Desktop Bluetooth Low Energy sniffing app for initial BLE development.
I didn't spend a whole lot of time profiling and optimizing this project, as I would be moving to different hardware in the next iteration. Remember, the current implementation is simultaneously buffering audio from the microphone and performing inference. The key to long battery life is 1) energy efficient hardware and 2) as much down time (deep sleep) as possible. I did however make sure it could perform continuous inference for a few days. The is an excellent tool for profiling projects like this. Please see my other projects at Hackster and element 14 for more in-depth tutorials.
The model was printed using a Formlabs 3 SLA 3D printer, with rigid Formlabs Gray v4 resin. The process starts by exporting the models as from Fusion 360. These are imported and arranged for printing using the PreForm software. There are many considerations in arranging the models for optimal print, carefully oriented parts and support material placement can drastically save post-print work, increase strength and surface finish.
I work with research and innovation at . I am a member of Edge Impulse Expert Network. This project was made on my own accord and the views are my own.
This is an interesting problem and I decided to use equally interesting hardware in this project: The . This board is designed for voice, acoustic event detection, and sensor machine learning application.
It is equipped with an ultra-low-power , a SAMD21 Cortex-M0+ host processor, an onboard microphone and a 6-axis motion sensor on an SD-card size package. This board is extremely small and perfect for my application.
The cherry on top is that .
The tab of the Edge Impulse Studio provides the facility to upload, visualise and edit the dataset. This tab provides simple functions such as splitting data into training and testing, and a lot of advanced functions such as filtering, cropping, splitting data into multiple data items, and many more.
The dataset is an audio type, which is time-series data and therefore the Impulse design tab automatically added the first block. Next is the block. Syntiant has prepared a pre-processing block called specifically for the NDP101 chip which is similar to but performs some additional processing. Next is the - the is perfect for this use case. The last block shows the Output features, i.e. classes.
After pre-processing the dataset, it's time to train a model. In the tab, adjust the training cycles, learning rate, etc. For the Syntiant TinyML board, the neural network architecture is fixed. I usually start with the default parameters, and based on the performance of the network I adjust the parameter.
At this moment, I realised that the work class contains sound samples of only keyboard and mouse operation, assuming people only work on computers. This is called , and I have inadvertently included bias in my machine-learning model. To correct my mistake, I disabled the work class from the training and test sets using the filters provided by the data acquisition tab.
After updating the dataset, go to the tab and click on the "Train model" button.
To test the model, jump over to the tab and click on Classify all. It will automatically pre-process and perform inferencing on the set-aside data, using the last trained model. The testing performance of the model in my case is 84.91%.
To deploy the model to the target hardware (Syntiant TinyML in this case), go to the tab. The Syntiant TinyML board requires finding posterior parameters before building firmware. For other hardware, it is not required. Select the deployment option (library or firmware) and click Build to generate output. The firmware is built, downloaded to your computer, then you can flash it to the board.
Arduino Nano 33 BLE Sense
Edge Impulse
Head over to and create your account or login. Once logged in you will be taken to the project selection/creation page.
You need to install the required dependencies that will allow you to connect your device to the Edge Impulse platform. This process is documented on the and includes installing:
Now download the the and unzip it, then double click on the relevant script for your OS either flash_windows.bat
, flash_mac.command
or flash_linux.sh
.
Next download the and extract the data. Navigate to the Noise directory and copy 50 random samples. Next go to the Data Acquisition tab and upload the new data into the Noise class. Finally copy 100 samples from the unknown class and upload to the Edge Impulse platform as an Unknown class.
Download this project from . Copy the contents of libraries/ai_patient_assistance/ai_patient_assistance.ino into the file and upload to your board. This may take some time.
You can use a free BLE app such as or to connect to your device and read the data published by it.
Constructing a wearable device that includes a Syntiant TinyML board to alert the user with haptic feedback if emergency vehicles or car horns are detected.
Created By: Solomon Githu
Public Project Link: https://studio.edgeimpulse.com/public/171255/latest
In Part 1 of this project, I demonstrated why wearable technology is on the rise and how TinyML is helping to advance the industry. I used the Syntiant TinyML board to run a Machine Learning model that monitors environmental sounds, and predicts when vehicle sounds are detected. Once the device detects the sound of a vehicle or its horn, it generates a vibration, similar to smartphones, that can be perceived by the wearer.
This project can be used, for example, to help people with hearing impairments navigate the streets safely. At the same time, this wearable device is ideal for people strolling through the streets with headphones on, listening to music or podcasts, while inadvertently neglecting the surrounding traffic!
Based on the outcomes of the initial undertaking, I decided to create a wearable device intended for hand placement. This wearable incorporates a protective casing that encompasses all the electronic components, along with wrist straps that allow it to be comfortably worn around the wrist. For the software components, I was satisfied with the model results so I reused the previous Edge Impulse project. For a detailed description on how the model was trained and deployed, please check Part 1 of the project's documentation.
This "Part 2" documentation will outline the process of constructing a similar wearable device. It will provide detailed instructions on how to assemble the device using the publicly available components.
Software components:
Arduino IDE
Hardware components:
3.7V LiPo battery. I used one with 500mAh
Veroboard/stripboard
1 220 Ω resistor
1 2N3904 transistor
1 5.7 k Ω resistor
Some jumper wires and male header pins
Soldering iron and soldering wire
Super glue. Always be careful when handling glues!
In the Edge Impulse Studio, the Impulse was deployed as an optimized Syntiant NDP 101/120 library. This packages all the signal processing blocks, configuration and learning blocks up into a single package. Afterwards, a custom Arduino code was created to analyze the model's predictions. This code can be obtained from this GitHub repository.
The Arduino code turns GPIO 1 HIGH
when an ambulance, firetruck or car siren/horn sounds are detected. GPIO 1 is then used to trigger a motor control circuit that creates a vibration. If you want to turn GPIO 2 or 3 high and low you can use the commands OUT_2_HIGH(), OUT_2_LOW(), OUT_3_HIGH() and OUT_3_LOW() respectively. These functions can be found in the syntiant.h
file.
Once the code is uploaded to the Syntiant TinyML board, we can use the Serial Monitor (or any other similar software) to see the logs being generated by the board. Open a terminal, select the COM Port of the Syntiant TinyML board with 115200 8-N-1 settings (in Arduino IDE, that is 115200 baud Carriage return). Sounds of ambulance sirens, firetruck sirens, and cars horns will turn the RGB LED red. For the "unknown" sounds, the RGB LED is off.
The wearable's components can be categorized into two parts: the electronic components and the 3D printed components.
The 3D printed component files can be downloaded from printables.com or thingiverse.com. The wearable casing is made up of two components: one holds the electrical components while the other is a cover. I 3D printed these with PLA material.
The other 3D printed components are flexible wrist straps. These are similar to the ones found on watches. I achieved the flexibility by printing them with TPU material. Note that if you do not have a good 3D printer you may need to widen the strap's holes after printing.
I then used super glue to attach the wrist straps to the case. Always be careful when handling glues!
Finally, the last 3D printed component is the wearable's dock/stand. This component is not important to the wearable's functionality. A device's dock/stand is just always cool! It keeps your device in place, adds style to your space, and saves you from the existential dread of your device being tangled in cables.
The wearable's electronic components include:
Syntiant TinyML board
3.7V LiPo battery - the wearable's case can hold a LiPo battery which has a maximum dimension of 38mm x 30mm x 5mm
Vibration motor module
Circuit board for controlling the vibration motor module - the wearable's case can hold a circuit board that has a maximum dimension of 34mm x 28mm x 5mm
The Syntiant TinyML board has a LiPo battery connector and copper pads that we can use to connect our battery to the board. I chose to solder some wires on the copper pads for the battery connection. Note that the "VBAT" copper pad is for the battery's positive terminal.
The next task is to setup the circuit board for controlling the vibration motor. This control circuit receives a signal from the Syntiant TinyML board GPIO and generates a signal that turns the vibration motor on or off. It can be easily soldered on a veroboard/stripboard with the following components:
1 220 Ω resistor – one end connects to the Syntiant GPIO 1, the other ends connects to base of transistor
1 2N3904 transistor - the emitter pin is connected to negative terminal of the battery
1 5.7 k Ω resistor – one end connects to the positive terminal of the battery, the other end connects to the collector of the transistor
Below is a circuit layout for the wearable. The motor's control circuit is represented by the transistor and resistors on the breadboard. The slide switch is optional. The case however has a slot on the side for placing one.
Once the electronic parts have been assembled, they can be put in the wearable's case according to the layout in the image below.
Below are some images of the assembly that I obtained.
As I was working on the electronic components, I was not so sure if the vibrations from the motor will be noticeable on the wearable. Fortunately, the motor module works very well! The wearable's vibration strength is similar to a smartphone's vibration. This can be seen in the video below showing how the motor vibrates from test code running on the Syntiant TinyML board.
Below is video of the wearable on a person's hand. The Syntiant TinyML board detects an ambulance siren sound in the background and signals it to the person.
Below are some additional images of the wearable.
This environmental sensing wearable is one of the many solutions that TinyML offers. The Syntiant TinyML Board is a tiny board with a microphone and accelerometer, USB host microcontroller and an always-on Neural Decision Processor, featuring ultra low-power consumption, a fully connected neural network architecture, and fully supported by Edge Impulse. I have always been fascinated by this tiny board and this made it the perfect choice for this project!
RatPack is another fascinating wearable that has been created using the Syntiant TinyML board. The huge African pouched rat has been given this gear to enable them to communicate with their human handlers when they come across a landmine or other interesting object. Please checkout the documentation to learn more about this fascinating project. All of the wearable's required components are open source, and the documentation provides step-by-step instructions so you can easily create your own.
Training and deploying a Keyword Spotting machine learning model on the Nordic Thingy:53.
Created By: Nekhil R.
Public Project Link: https://studio.edgeimpulse.com/public/170378/latest
In any industry with safety requirements, workers need to be able to report accidents that occur in the workplace. In this project, we are developing a device where employees can simply speak into a microphone to report an accident. The device will be loaded with a machine-learning model that can recognize an accident-reporting keyword. This “keyword spotting” implementation can speed up the reporting process, or make it easier for employees who may not be able to write due to work conditions or literacy. We are only focused on the capability of the device to learn and then recognize our selected keyword here, but future work could include the ability to log events, record audio reports of the incident that has occurred, or other applications.
The development board used in this project is the Nordic Thingy:53. It allows you to create prototypes and proof of concept devices without building customized hardware. Its twin Arm Cortex-M33 processors provide enough processing power and memory to execute embedded machine learning (ML) models directly on the development platform.
The Nordic Thingy:53 can be linked to an Edge Impulse Studio account using the nRF Edge Impulse app. It enables you to download trained ML models to the Thingy:53 for deployment and local inferencing, and wirelessly communicate sensor data to a mobile device through Bluetooth LE. It also supports a wired serial connection for to a computer, for use in Edge Impulse Studio. To learn more about configuring the Thingy:53, you can check out the Docs here.
Data Collection is one of the most important steps in a machine learning project. In this project, we used the nRF Edge Impulse app for the Data collection, making it easy to capture input data. The main keyword used here for reporting an accident, is literally the word Accident itself. Below, you can see our data collection settings. We recorded 10 seconds of data, at 16Khz frequency.
This is our keyword collected, along with some noise.
We were able to omit the noise by splitting the sample. So we got our exact keyword, only the piece that we wanted. To make the machine learning model more robust, we collected this same data (the word Accident) from more people. Different ages, different genders, and different talking speeds help make the data better.
In addition to the keyword we'll also need audio that is not our keyword. Background noise, a television ('noise' class), and humans saying other words ('unknown' class) help provide data for the other classes. A machine learning model requires a certain level of "uniformity" in the data it is exposed to, as otherwise it will not be able to learn effectively. The more diverse your data is, the better your machine learning algorithm will perform. So for the Unknown and Noise classes we also made use of this Edge Impulse dataset. To add this data to our project we used the direct Upload method by browsing from the local computer. More information on this dataset can be found here.
We ended up with about 24 minutes of data, which is split it between the Training and Testing:
19m 35s data for Training.
4m 47s data for Testing.
With the Edge Impulse Data explorer, you can represent your dataset visually, detect outliers, and label unlabeled data. This gives us a quick overview of our complete dataset. To learn more about the Data explorer, you can review the documentation here.
This is our dataset visualization that was generated using the Data explorer.
In this project we only care about the “Accident” keyword, and “Unknown” and “Noise” are not useful for us. As you can see, some of our “Accident” keywords are located in the “Noise” cluster, and a few of them also reside in the “Unknown” cluster of data. Upon examining the ones residing in the Noise cluster a bit closer, we observed that some of them are noises that came about as a result of improper keyword splitting. The remaining items are audio that is unclear or hard to discern the word being spoken, that's how they ended up in the Noise class. Here is one example we found that is indeed an “Accident” keyword that is misclassified in the Noise cluster.
So we deleted the extreme outlying “Accident” keywords that are misclassified and rebalanced the dataset, making it look much cleaner.
Now it is time to create an Impulse, which is the machine learning pipeline built by Edge Impulse.
For our Processing block, we used Audio (MFE), which is great for the keyword spotting and worked very well with our data.
This is the MFCC Feature generation details page, presented after clicking Save Impulse.
On the right side, we can see the Mel Cepsteral Coefficients of the data, and on the left side the hyper parameters for generating the Mel Frequency Cepstral Coefficients. We are going to leave the default parameters, because they work very well for keyword spotting.
These are the Features generated for our data, shown in a similar format as the Data explorer, for quick visual reference.
With all our data processed, it's time to start training a neural network.
To build our neural network, we simply start by clicking Classifier on the left. The Neural Network default settings might work ok, but you can fine tune them if you need to. More information on the settings can be found in the documentation located here.
Here are our neural network settings and architecture, which work very well for our data.
By enabling Data augmentation, Edge Impulse will make small changes and variations to the dataset, making it more resilient and improving the overall performance by preventing overfitting. Here we used 1-D convolutional architecture which works great for keyword spotting. Let’s take a look at the resulting performance to validate our mode and ensure it’s doing what we expect.
Our dataset is not really that large, and yet we got 94.4% accuracy, so this is pretty good and we can proceed with this model.
The next step in the process is to test the model on some new, unseen data before deploying it in the real world.
When we collected data, we set aside that 4 minutes and 47 seconds of data for Testing, which was not used in the creation and training of the model. Now it can be used for Testing, by simply clicking on Model Testing on the left.
After letting it run, it looks like we got around 87% accuracy. Not as high as our predicted 94% accuracy, but still very good.
To deploy our model onto the device, back in the mobile App you can simply go to the Deployment tab and click on the Build option. The App will start building our project, and then deploy the firmware directly to our Thingy:53 upon completion.
After successfully deploying the model to the device, we can start our inferencing by switching to the Inferencing tab in the App.
This video shows real-time inferencing in the Thingy:53.
This project showed how to teach a Thingy:53 to recognize a keyword, commonly known as “keyword spotting”. The process began with recording the word we are interested in recognizing, ensuring we have enough data for a quality dataset, building an Edge Impulse machine learning model, then deploying the model to the device. The keyword used in this project was “Accident”, though any word could be used. Further work on this topic could then add new capabilities beyond the keyword spotting to add voice recording, logging to a dashboard, or similar expanded functionality.
A snore-no-more device embedded in your pillow aims to help those who suffer from obstructive sleep apnea.
Created By: Naveen Kumar
Public Project Link: https://studio.edgeimpulse.com/public/226454/latest
It is estimated that more than half of men and over 40% of women in the United States snore, and up to 27% of children. While snoring can be a harmless, occasional occurrence, it can also indicate a serious underlying sleep-related breathing disorder. Snoring is caused by the vibration of tissues near the airway that flutter and produce noise as we breathe. Snoring often indicates obstructive sleep apnea, a breathing disorder that causes repeated lapses in breath due to a blocked or collapsed airway during sleep. Despite being unaware of their snoring, many people suffer from sleep apnea, leading to under-diagnosis. As part of my project, I have developed a non-invasive, low-powered edge device that monitors snoring and interrupts the user moderately through a haptic feedback mechanism.
Our system utilizes the Arduino Nicla Voice, which is designed with the Syntiant NDP120 Neural Decision Processor. This processor allows for embedded machine-learning models to be run directly on the device. Specifically designed for deep learning, including CNNs, RNNs, and fully connected networks, the Syntiant NDP120 is perfect for always-on applications with minimal power consumption. Its slim profile also makes it easily portable, which suits our needs.
There are several onboard sensors available on the Nicla Voice, but for this particular project, we will solely make use of the onboard PDM microphone. We are utilizing an Adafruit DRV2605L haptic motor controller and an ERM vibration motor to gently alert users without being intrusive. The haptic motor driver is connected to the Nicla Voice using an Eslov connector and communicates over I2C protocol. The haptic motor driver gets power from the VIN pin since the Eslov connector does not provide power.
To set this device up in Edge Impulse, we will need to install two command-line interfaces by following the instructions provided at the links below.
Please clone the Edge Impulse firmware for this specific development board.
To obtain audio firmware for Nicla Voice, kindly download it from the provided link:
https://cdn.edgeimpulse.com/firmware/arduino-nicla-voice-firmware.zip
To install the Arduino Core for the Nicla board and the pyserial
package required to update the NDP120 chip, execute the commands below.
We will need to register a free account at Edge Impulse to create a new project.
We have used a dataset available at https://www.kaggle.com/datasets/tareqkhanemu/snoring. Within the dataset, there are two separate directories; one for snoring and the other for non-snoring sounds. The first directory includes a total of 500 one-second snoring samples. Among these samples, 363 consist of snoring sounds created by children, adult men, and adult women without any added background noise. The remaining 137 samples include snoring sounds that have been mixed with non-snoring sounds. The second directory contains 500 one-second non-snoring samples. These samples include background noises that are typically present near someone who is snoring. The non-snoring samples are divided into ten categories, each containing 50 samples. The categories are baby crying, clock ticking, door opening and closing, gadget vibration motor, toilet flushing, emergency vehicle sirens, rain and thunderstorm, street car sounds, people talking, and background television news. The audio files are in the 16-bit WAVE audio format, with a 2-channels (stereo) configuration and a sampling rate of 48,000 Hz. However, we require a single-channel (mono) configuration at a sampling rate of 16,000 Hz, so we need to convert it accordingly using FFmpeg.
The datasets are uploaded to the Edge Impulse Studio using the Edge Impulse Uploader. Please follow the instructions here to install the Edge Impulse CLI tools and execute the commands below.
To ensure accurate prediction, the Syntiant NDP chips necessitate a negative class that should not be predicted. For the datasets without snoring, the z_openset class label is utilized to ensure that it appears last in alphabetical order. By using the commands provided, the datasets are divided into Training and Testing samples. Access to the uploaded datasets can be found on the Data Acquisition page of the Edge Impulse Studio.
Go to the Impulse Design > Create Impulse page and click on the Add a processing block and choose Audio (Syntiant) which computes log Mel-filterbank energy features from an audio signal specific to the Syntiant audio front-end. Also, on the same page click on the Add a learning block and choose Classification which learns patterns from data, and can apply these to new data. We opted for a default window size of 968 ms and a window increase of 32 ms, which allows for two complete frames within a second of the sample. Now click on the Save Impulse button.
Now go to the Impulse Design > Syntiant page and choose the Feature extractors parameter as log-bin (NDP120/200). The number of generated features has to be 1600, which corresponds to the Syntiant Neural Network input layer. To generate 1600 features, it is necessary to validate the following equation:
window size = 1000 x (39 x Frame stride + Frame length)
In our particular case, the window size is 968 ms, which is equivalent to 1000 x (39 x 0.024 + 0.032).
Clicking on the Save parameters button redirects us to another page where we should click on Generate Feature button. It usually takes a couple of minutes to complete feature generation. We can see the visualization of the generated features in the Feature Explorer.
Next, go to the Impulse Design > Classifier page and choose the Neural Network settings as shown below.
We have conducted experiments and established the model architecture using a 2D CNN, which is outlined below.
To begin the training process, click on the Start training button and patiently wait for it to finish. Once completed, we can review the output of the training and the confusion matrix provided below. It's worth noting that the model has achieved an impressive accuracy rate of 99.4%.
To assess the model's performance on the test datasets, navigate to the Model testing page and select the Classify all button. We can be confident in the model's effectiveness on new, unseen data, with an accuracy rate of 94.62%.
When deploying the model on Nicla Voice, we must choose the Syntiant NDP120 library option on the Deployment page.
To build the firmware, it is necessary to configure the posterior parameters of the Syntiant NDP120 Neural Decision Processor by clicking the Find posterior parameters button.
To detect snoring events, we'll choose that class and disregard z_openset. Also, we are not using any calibration dataset. Then, we'll click Find Parameters to adjust the neural network's precision and recall and minimize the False Rejection Rate and False Activation Rate.
Now click on the Build button and in a few seconds the library bundle will be downloaded to the local computer. We can see the following file listings after unzipping the compressed bundle.
Now copy the model-parameters content into the src/model-parameters/ directory of the firmware source code that we cloned in the Setup Development Environment section.
We need to customize the firmware code in the firmware-arduino-nicla-voice/src/ei_syntiant_ndp120.cpp file. Add the following declarations near the top section of the file.
In the ei_setup() function, add the haptic feedback motor driver initialization code.
You can add custom logic to the main Arduino sketch by customizing the match_event() function as follows.
To compile the firmware execute the command below.
Once we've compiled the Arduino firmware we need to follow the steps below to deploy the firmware to the device. It is assumed that all firmware and deployment libraries are in the same directory level.
Take the .elf file output by Arduino and rename it to firmware.ino.elf.
Replace the firmware.ino.elf from the default firmware
Replace the ei_model.synpkg file in the default firmware by the one from the downloaded Syntiant library.
Now connect the Nicla Voice to the computer using a USB cable and execute the script (OS-specific) to flash the board.
We can monitor the inferencing results over the serial connection at a 115200 baud rate after resetting the board or power cycling.
When classifying audio, it is recommended to analyze multiple windows of data and calculate the average of the results. This approach can provide better accuracy compared to relying on individual results alone. To detect snoring events, the application maintains a circular buffer of the last 10 inferencing results. If three snoring events are detected, the haptic feedback motor will vibrate for a few seconds. During the demo, the motor vibration appears to be moderate. However, the user can still feel the vibration while using the pillow.
This project provides a practical solution to a seemingly humorous yet serious issue. The device is user-friendly and respects privacy by conducting inferencing at the edge. Although the model performs well, there is potential for enhancement through the inclusion of more refined training data, resulting in improved accuracy and robustness to further reduce the likelihood of false positives and negatives. Additionally, this project showcases the ability of a simple neural network to address complex problems, executed through effective digital signal processing on low-powered, resource-restricted devices such as Syntiant NDP120 Neural Decision Processor on the Nicla Voice.
An acoustic sensing project that uses audio classification on a Syntiant TinyML board to listen for keywords, and take action via GPIO.
Created By: Solomon Githu
Public Project Link: https://studio.edgeimpulse.com/public/111611/latest
The International Labor Organization estimates that there are over 1 million work-related fatalities each year and millions of workers suffer from workplace accidents. However, even as technology advancements have improved worker safety in many industries, some accidents involving workers and machines remain undetected as they occur, possibly even leading to fatalities. This is because of the limitations in Machine Safety Systems. Safety sensors, controllers, switches and other machine accessories have been able to provide safety measures during accidents but some events remain undetected by these systems.
Some accidents which are difficult to be detected in industries includes:
Falling Objects
Objects strike or fall on employees
Slips or Falls of employees
Chemical burns or exposure
Workers caught in moving machine parts
Sound classification is one of the most widely used applications of Machine Learning. When in danger or scared, we humans respond with audible actions such as screaming, crying, or with words such as: "stop", or "help" . This alerts other people that we are in trouble and can also give them instructions such as stopping a machine, or opening/closing a system. We can use sound classification to give hearing to machines and manufacturing setups so that they can be aware of the environment status.
TinyML has enabled us to bring machine learning models to low-cost and low-power microcontrollers. We will use Edge Impulse to develop a machine learning model which is capable of detecting accidents from workers screams and cries. This event can then be used to trigger safety measures such as machine/actuator stop, and sound alarms.
The Syntiant TinyML Board is a tiny development board with a microphone and accelerometer, USB host microcontroller and an always-on Neural Decision Processor™, featuring ultra low-power consumption, a fully connected neural network architecture, and fully supported by Edge Impulse. Here are quick start tutorials for Windows and Mac.
You can find the public project here: Acoustic Sensing of Worker Accidents. To add this project into your account projects, click "Clone this project" at the top of the window. Next, go to the "Deploying to Syntiant TinyML Board" section below to see how you can deploy the model to the Syntiant TinyML board.
Alternatively, to create a similar project, follow the next steps after creating a new Edge Impulse project.
We want to create a model that can recognize both key words and human sounds like cries and screams. For these, we have 4 classes in our model: stop, help, cry and scream. In addition to these classes, we also need another class that is not part of our 4 keywords. We label this class as "unknown" and it has sound of people speaking, machines, and vehicles, among others. Each class has 1 second of audio sounds.
In total, we have 31 minutes of data for training and 8 minutes of data for testing. For the "unknown" class, we can use Edge Impulse Key Spotting Dataset, which can be obtained here. From this dataset we use the "noise" audio files.
The Impulse design is very unique as we are targeting the Syntiant TinyML board. Under 'Create Impulse' we set the following configurations:
Our window size is 968ms, and window increase is 484ms milliseconds(ms). Click 'Add a processing block' and select Audio (Syntiant). Next, we add a learning block by clicking 'Add a learning block' and select Classification (Keras). Click 'Save Impulse' to use this configuration.
Next we go to our processing block configuration, Syntiant, and first click 'Save parameters'. The preset parameters will work well so we can use them in our case.
On the window 'Generate features', we click the "Generate features" button. Upon completion we see a 3D representation of our dataset. These are the Syntiant blocks that will be passed into the neural network.
Lastly, we need to configure our neural network. Start by clicking "NN Classifier" . Here we set the number of training cycle to 80, with a learning rate of 0.0005. Edge Impulse automatically designs a default Neural Network architecture that works very well without requiring the parameters to be changed. However, if you wish to update some parameters, Data Augmentation can improve your model accuracy. Try adding noise, masking time and frequency bands and asses your model performance with each setting.
With the training cycles and learning rate set, click "Start training", and you will have a neural network when the task is complete. We get an accuracy of 94%, which is pretty good!
When training our model, we used 80% of the data in our dataset. The remaining 20% is used to test the accuracy of the model in classifying unseen data. We need to verify that our model has not overfit by testing it on new data. If your model performs poorly, then it means that it overfit (crammed your dataset). This can be resolved by adding more data and/or reconfiguring the processing and learning blocks if needed. Increasing performance tricks can be found in this guide.
On the left bar, click "Model testing" then "classify all". Our current model has a performance of 91% which is pretty good and acceptable.
From the results we can see new data called "testing" which was obtained from the environment and sent to Edge Impulse. The Expected Outcome column shows which class the collected data belong to. In all cases, our model classifies the sounds correctly as seen in the Result column; it matches the Expected outcome column.
To deploy our model to the Syntiant Board, first click "Deployment". Here, we will first deploy our model as a firmware on the board. When our audible events (cry, scream, help, stop) are detected, the onboard RGB LED will turn on. When the unknown sounds are detected, the on board RGB LED will be off. This runs locally on the board without requiring an internet connection, and runs with minimal power consumption.
Under "Build Firmware" select Syntiant TinyML.
Next, we need to configure posterior parameters. These are used to tune the precision and recall of our Neural Network activations, to minimize False Rejection Rate and False Activation Rate. More information on posterior parameters can be found here: Responding to your voice - Syntiant - RC Commands, in "Deploying to your device" section.
Under "Configure posterior parameters" click "Find posterior parameters". Check all classes apart from "unknown", and for calibration dataset we use "No calibration (fastest)". After setting the configurations, click "Find parameters".
This will start a new task which we have to wait until it is finished.
When the job is completed, close the popup window and then click "Build" options to build our firmware. The firmware will be downloaded automatically when the build job completes. Once the firmware is downloaded, we first need to unzip it. Connect a Syntiant TinyML board to your computer using a USB cable. Next, open the unzipped folder and run the flashing script based on your Operating System.
We can connect to the board's firmware over Serial. To do this, open a terminal, select the COM Port of the Syntiant TinyML board with settings 115200 8-N-1 settings (in Arduino IDE, that is 115200 baud Carriage return).
Sounds such as "stop", "help", "aaagh!" or crying will turn the RGB LED to red.
For the "unknown" sounds, the RGB LED is off. While configuring the posterior parameters, the detected classes that we selected are the ones which trigger the RGB LED lighting.
We can use our Machine Learning model as a safety feature for actuators, machines or other operations involving people and machines.
To do this we can build custom firmware for our Syntiant TinyML board that turns a GPIO pin HIGH or LOW based on the detected event. The GPIO pin can then be connected to a controller that runs an actuator or a system. The controller can then turn off the actuator or process when a signal is sent by the Syntiant TinyML board.
A custom firmware was then created to turn on GPIO 1 HIGH (3.3V) of the Syntiant TinyML Board whenever the alarming sounds are detected. GPIO 1 is next to the GND pin so we can easily use a 2-pin header to connect our TinyML board with another device.
Awesome! What's next now? Checkout the custom firmware here and add intelligent sensing to your actuators and also home automation devices!
I leveraged my TinyML solution and used it to add more sensing to my LoRaWAN actuator. I connected the Syntiant TinyML board to an Atmega and SX1276 based development board called the WaziAct. This board is designed to play as a production LoRa actuator node with an onboard relay which I often use to actuate pumps, solenoids, and electrical devices. I programmed the board to read the pin status connected to the Syntiant TinyML board and when a signal is received it stops executing the main tasks. An alert is also sent to the gateway via LoRa while the main tasks remain halted. The Arduino code can be accessed here.
Below is a sneak peak of an indoor test… Now my "press a button" LoRaWAN actuations can run without causing harm such as turning on a faulty device, pouring water via solenoid/pump in unsafe conditions, and other accidental events!
We have seen how we can use sounds to train and deploy our ML solution easily and also run them locally on a development board. TinyML-based intelligent sensing, such as is shown here, is just one of the many solutions that TinyML offers.
With Edge Impulse, developing ML models and deploying them has always been easy. The Syntiant TinyML board was chosen for this project because it provides ultra-low power consumption, fully connected neural network architecture, has an onboard microphone, is physically small, and is also fully supported by Edge Impulse.
Audio classification with an Arduino Nano 33 BLE Sense to detect the sound of glass breaking.
Created By: Nekhil R.
Public Project Link: https://studio.edgeimpulse.com/public/149095/latest
GitHub Repository: https://github.com/CodersCafeTech/Vandalism-Detection
The direct annual cost of vandalism runs in the billions of dollars annually in the United States alone. Breaking glass and defacing property are some of the serious forms of vandalism. Conventional security techniques such as direct lighting and intruder alarms can be ineffective in so locations and cases, so here we explore another form of prevention. In this project, we are able to detect the sound of glass breaking, and can alert a user instantly about the event.
In this project, we only focus on glass breaking, however, this project can be applied to any other form of vandalism that also produces a unique sound.
The device will work as follows. Suppose a vandal tried to break glass, which will of course have a unique sound. The tinyML model running on the device can recognize the event using a microphone. Then the device will send email notifications to a registered user regarding the audio detection.
For this project we are using an Arduino Nano 33 BLE Sense. It's a 3.3V AI-enabled board in a very small form factor. It comes with a series of embedded sensors including an MP34DT05 Digital Microphone.
The ESP-01 is used for adding WiFi capability to the device, because the Arduino Nano 33 BLE Sense does not have any native WiFi capability. The WiFi is specifically used for sending email alerts. Serial communication is established between the Arduino and ESP-01, for transmitting the email.
One of the most important parts of any machine learning model is its dataset. Edge Impulse offers us two options to create our dataset: either direct uploading of files, or recording data from actual the device itself. For this project we chose to record data with the device itself, because as a prototype, the data will be limited. A second reason to record data with the device itself is that it can improve accuracy. To get started connecting the Nano 33 BLE Sense to Edge Impulse, you can have a look at this tutorial.
In this scenario, we have only two classes Glass Break, and Noise. Glass breaking sounds that we have used are from the vivid online resources and the major noise datasets are from the Microsoft Scalable Noisy Speech Dataset (MS-SNSD). We also included the natural noise in the room, apart from the MS-SNSD data.
The sound recording was done for 20 seconds at a 16KHz sampling rate. Something to keep in mind is that you must keep the sampling rate the same between your training dataset and your deployment device. If you are training with 44.1Khz sound, you need to downsample it to 16KHz when you are ready to deploy to the Arduino.
We collected around 10 minutes of data and split it between Training and a Test set. In the Training data we split the samples to 2s, otherwise the inferencing will fail because the BLE Sense has a limited amount of memory to handle the data.
This is our Impulse, which is the machine learning pipeline termed by Edge Impulse.
Here we used MFE as the processing block, because it is very suitable for non-human voices. We have used the default parameters of the MFE block.
These are our Neural Network settings, which we found most suitable for our data. If you are tinkering with your own dataset, you might need to change these parameters a bit, and some exploration and testing could be required.
We enabled the Data augmentation feature, which helps us to randomly transform data during training. This we are able to run more training cycles without overfitting the data, and also helps improve accuracy.
This is our Neural Network architecture.
We have used the default 1D convolutional layer, then we trained the model. We ended up with 97% accuracy, which is very awesome. By looking at the confusion matrix it is clear that there is no sign of underfitting and overfitting.
Before deploying the model, it's a good practice to run the inference on the Test dataset that was set aside earlier. In the Model Testing, we got around 92% accuracy.
Let's look into some of the misclassifications, to better understand what is happening. In this case, the noise very well resembled the Glass Break sound, which is why it is misclassified:
In this next case, the model performed very well in classifying the data, although the data contains both the Glass_Break and some noise. The majority of the data was noise however, that's why its misclassified.
In these two cases shown below, again Noise was the major reason for misclassification:
Overall though, the model is performing very well and can be deployed in the real world.
For deploying the Impulse to the BLE Sense, we exported the model as an Arduino library from the Studio.
Then we add that library to the Arduino IDE. Next, we modified the example sketch that is provided, to build our application. You can find the code and all assets including the circuit in this GitHub Repository.
For triggering the email we used the IFTT service. To setup the mail triggering upon any positive detections, please have a look at this tutorial.
Here is the application I have created:
All the components were fit inside this case, to make a tidy package:
Here are the results of some live testing, after the model is deployed to the device. You can check it out in the below video. The sound of the glass breaking is played on a speaker, and you can see the results of the inferencing and email being sent.
A proof-of-concept that uses an Arduino to listen for anomalies in the sound of a running motor.
Created By: Shebin Jose Jacob
Public Project Link: https://studio.edgeimpulse.com/public/162492/latest
Every manufacturing environment is equipped with machines. For a better-performing manufacturing unit, the health of machines plays a major role and hence maintenance of the machines is important. We have three strategies of maintenance namely - Preventive maintenance, Corrective maintenance, and Predictive maintenance.
If you want to find the best balance between preventing failures and avoiding over-maintenance, Predictive Maintenance (PdM) is the way to go. Equip your factory with relatively affordable sensors to track temperature, vibrations, and motion data, use predictive techniques to schedule maintenance when a failure is about to occur, and you'll see a nice reduction in operating costs.
In the newest era of technology, teaching computers to make sense of the acoustic world is now a hot research topic. So in this project, we use sound to do some predictive maintenance using an Arduino Nano 33 BLE Sense.
We use the Nano 33 BLE Sense to listen to the machine continuously. The MCU runs an ML model which is trained on two sets of acoustic anomalies and a normal operation mode. When the ML model identifies an anomaly, the operator is immediately notified and the machine may be shut down for maintenance after proper inspection. Thus, we can reduce the possible damage caused and can reduce the downtime.
Nano 33 BLE Sense
LED
Edge Impulse
Arduino IDE
The hardware setup consists of a Nano 33 BLE Sense, which is placed beside an old AC motor.
If you haven't connected the device to Edge Impulse dashboard, follow this tutorial to get it connected. After a successful connection, it should be present in the Devices tab.
Alternatively, recent versions of Google Chrome and Microsoft Edge can collect data directly from your development board, without the need for the Edge Impulse CLI. Follow this tutorial to learn more about it.
Clean data is the most important requirement to train a well-performing model. In our case, we have collected 3 classes of sound - two classes of anomalies, one normal operation class, and a noise class. Each sample is 2 seconds long. The raw data of these classes is visualised below.
If the data is not split into training and testing datasets, split the dataset into training and testing datasets in the ratio 80:20, which forms a good dataset for model training.
An Impulse is the machine learning pipeline that takes raw data, uses signal processing to extract features, and then uses a learning block to classify new data.
Here we are using the Time Series data as the input block. Now, we have two choices for the processing block - MFCC and MFE. As we are dealing with non-vocal audio and MFE performs well with non-vocal audio, we have chosen MFE as our processing block. We have used Classification as our learning block since we have to learn patterns and apply them to new data to categorize the audio into one of the given 4 classes.
In the MFE tab, you can tweak the parameters if you're good with audio handling, else leave the settings as it is and generate features.
Now that we have our Impulse designed, let's proceed to train the model. The settings we employed for model training are depicted in the picture. You can play about with the model training settings so that the trained model exhibits a higher level of accuracy, but be cautious of overfitting.
A whopping 94.7% accuracy is achieved by the trained model.
Let's now use some unknown data to test the model's functionality. To assess the model's performance, move on to Model Testing and Classify All.
We have got 95.07% testing accuracy, which is pretty awesome. Now let's test the model with some real-world data. Navigate to Live Classification and collect some data from the connected device.
We have collected some real-world data of Normal Operation Mode, Anomaly 1, and Anomaly 2 respectively, and all of them are correctly classified. So our model is ready for deployment.
For deployment, navigate to the Deployment tab, select Arduino Library and build the library. It will output a zip library, which can be added to Arduino IDE.
Nano 33 BLE Sense along with an LED is enclosed in a 3D printed case, which is our final product. The device is capable of identifying acoustic anomalies in a machine and alerts the user using the alert LED.
A proof-of-concept machine learning project for first responders, to detect the sound of a gunshot.
Created By: Swapnil Verma
Public Project Link: https://studio.edgeimpulse.com/public/133765/latest
On May 24, 2022, nineteen students and two teachers were fatally shot, and seventeen others were wounded at Robb Elementary School in Uvalde, Texas, United States[1]. An 18-year-old gunman entered the elementary school and started shooting kids and teachers with a semi-automatic rifle. The sad part is that it is not a one-off event. Gun violence including mass shootings is a real problem, especially in the USA.
Gun violence is a massive problem and I alone can not solve it, but I can definitely contribute an engineering solution toward hopefully minimizing casualties.
Here I am proposing a proof of concept to identify gun sounds using a low-cost system and inform emergency services as soon as possible. Using this system, emergency services can respond to a gun incident as quickly as possible thus hopefully minimizing casualties.
My low-cost proof of concept uses multiple microcontroller boards with microphones to capture sound. They use a TinyML algorithm prepared using Edge Impulse to detect any gunshot sound. Upon a positive detection, the system sends a notification to registered services via an MQTT broker.
To learn more about the working of the system, please check out the Software section.
The hardware I am using is:
Arduino Portenta H7
Arduino Nano BLE Sense
9V Battery
In the current hardware iteration, the Arduino Nano BLE Sense is powered by a 9V battery and the Portenta H7 is powered via a laptop, because I am also using the serial port on the Portenta H7 to debug the system.
The software is divided into 2 main modules:
Machine Learning
Communication
The machine learning module uses a tinyML algorithm prepared using Edge Impulse. This module is responsible for identifying gunshot sounds. It takes sound as input and outputs its classification.
One of the most important parts of any machine learning model is its dataset. In this project, I have used a combination of two different datasets. For the gunshot class, I used the Gunshot audio dataset and for the other class, I used the UrbanSound8K dataset from Kaggle.
Edge Impulse Studio's Data acquisition tab provides useful features to either record your own data or upload already-collected datasets. I used the upload feature.
While uploading you can also provide a label to the data. This was very helpful because I am using sounds from multiple origins in the other class. After uploading the data, I cleaned the data using the Studio's Crop and Split sample feature.
You can either divide data into test and train sets while uploading or do it at a later time. The Data acquisition tab shows the different classes and train/test split ratio for our convenience.
After preparing the dataset, we need to design an Impulse. The Edge Impulse documentation explains the Impulse design in great detail so please check out their documentation page to learn about Impulse design.
As you can see, I have selected MFCC as the preprocessing block, and classification as the learning block. I have used the default parameters for the MFCC preprocessing. For training, I have slightly modified the default neural network architecture. I have used three 1D convolutional CPD layers with 8, 16 and 24 neurons, respectively. The architecture is illustrated in the below image.
Modifying the neural network architecture in the Edge Impulse Studio is very easy. Just click on Add an extra layer at the bottom of the architecture and select any layer from the popup.
Or you can also do it from the Expert mode if you are feeling masochistic 😉.
I trained the model for 5000 iterations with a 0.0005 learning rate. My final model has 94.5% accuracy.
The Edge Impulse Studio's Model testing tab enables a developer to test the model immediately. It uses the data from the test block and performs the inference using the last trained model. My model had 91.3% accuracy on the test data.
One nice feature Edge Impulse provides is versioning. You can version your project (like Git) to store all data, configuration, intermediate results and final models. You can revert back to earlier versions of your impulse by importing a revision into a new project. I use this feature every time before changing the neural network architecture. That way I don't have to retrain or keep a manual record of the previous architecture.
After completing the training, it's time for deployment. The Deployment tab of the Edge Impulse Studio provides three main ways of deploying the model onto hardware: (a) by creating a library (b) by building firmware, and (c) by running it on a computer or a mobile phone directly. I knew that I need more functionality from my Arduino hardware apart from inferencing, so I created a library instead of building firmware.
Just select the type of library you want, and click the Build button at the bottom of the page. This will build a library and download it onto your computer. After downloading, it will also show a handy guide to include this library in your IDE.
The coolest part is that I don't need to retrain the model or do anything extra to deploy the same model onto multiple devices. The examples of the downloaded library will have example code for all the supported devices of the same family.
Just select an example as a getting started code, modify it according to your need and flash it. The Arduino Nano BLE Sense and Portenta H7 use the same model generated by Edge Impulse. I trained the model only once, agnostic of the hardware and deployed it on multiple devices which is a time saver.
Inferencing is the process of running a neural network model to generate output. The image below shows the inference pipeline.
The microphones in the Nano BLE Sense and Portenta H7 pick up the surrounding sound (stages 1 & 2). The sound data is then preprocessed using the MFCC block (stage 3). The preprocessed data is then sent to a Convolutional Neural Network block (stage 4) which classifies it into either the gunshot class or the other class (stage 5).
To learn more about the project please follow the below link.
Project Link - https://studio.edgeimpulse.com/public/133765/latest
The output of the machine learning module is then processed before sending it to the cloud via the Communication module.
This module is responsible for sharing information between boards and sending positive predictions to the registered emergency services.
The Nano BLE Sense sends its inference to the Portenta H7 via BLE. The Portenta H7 then sends the positive output (i.e. detection of gunshot sound) of its inference and Nano BLE's inference to a subscriber via MQTT. I have used the cloudMQTT broker for this project.
To download the software please use the below link:
Software Link - https://github.com/sw4p/Gunshot_Detection
My testing setup and the result are illustrated in the video below. The system is connected to my laptop which is also performing the screen recording. On the upper left side, we have an Arduino serial window which is showing the output from the Portenta H7, and on the lower left hand, we have an audio player. On the right-hand side, we have cloudMQTT's WebSocket UI, which shows the incoming notification via MQTT. The sound for this video is played and recorded using my laptop's speaker and microphone.
In the video above I am playing different categories of sound and one of that categories is a gunshot. The system outputs the classification result to the Arduino serial port whenever it detects a sound from the other class but does not send it to the receiver. The moment it detects a gunshot sound, it immediately sends a notification to the receiver via CloudMQTT.
https://github.com/sw4p/Gunshot_Detection
[1] https://en.wikipedia.org/wiki/Robb_Elementary_School_shooting
[2] https://www.gunviolencearchive.org/
A prototype smart device that uses a Syntiant TinyML board to alert the wearer with haptic feedback if emergency vehicles or car horns are detected.
Created By: Solomon Githu
Public Project Link: https://studio.edgeimpulse.com/public/171255/latest
An electronic device that is intended to function on a user's body is considered wearable technology. The largest categories of wearables are smartwatches and hearables, which have experienced the fastest growth in the recent years. Steve Roddy, former Vice President of Product Marketing for Arm's Machine Learning Group once said that "TinyML deployments are powering a huge growth in ML deployment, greatly accelerating the use of ML in all manner of devices and making those devices better, smarter, and more responsive to human interaction". TinyML enables running Machine Learning on resource constrained devices like wearables.
Sound classification is one of the most widely used applications of Machine Learning. A new use case for wearables is an environmental audio monitor for individuals with hearing disabilities. This is a wearable device that has a computer which can listen to the environment sounds and classify them. In this project, I focused on giving tactile feedback when vehicle sounds are detected. The Machine Learning model can detect ambulance and firetruck sirens as well as cars honking. When these vehicles are detected, the device then gives a vibration pulse which can be felt by the person wearing the device. This use case can be revolutionary for people who have hearing problems and even deaf people. To keep people safe from being injured, the device can inform them when there is a car, ambulance or firetruck nearby so that they can identify it and move out of the way.
I used Edge Impulse Platform to train my model and deploy it to the Syntiant TinyML board. This is a tiny development board with a microphone and accelerometer, USB host microcontroller and an always-on Neural Decision Processor™, featuring ultra low-power consumption, a fully connected neural network architecture, and supported by Edge Impulse.
You can find the public Edge Impulse project here: Environmental Audio Monitor. To add this project into your Account projects, click "Clone" at the top of the window. Next, go to "Deploying to Syntiant TinyML board" section to see how you can deploy the model to the Syntiant TinyML board.
I first searched for open datasets of ambulance siren, firetruck siren, car horns and traffic sounds. I used the Kaggle dataset of Emergency Vehicle Siren Sounds and the Isolated urban sound database for the key sounds. From these datasets, I created the classes "ambulance_firetruck" and "car_horn".
In addition to the key events that I wanted to be detected, I also needed another class that is not part of them. I labelled this class as "unknown" and it has sounds of traffic, people speaking, machines, and vehicles, among others. Each class has 1 second of audio sounds.
In total, I had 20 minutes of data for training and 5 minutes of data for testing. For part of the "unknown" class, I used Edge Impulse keywords dataset. From this dataset, I used the “noise” audio files.
The Impulse design was very unique as I was targeting the Syntiant TinyML board. Under "Create Impulse" I set the following configurations:
The window size is 968ms and window increase is 484ms milliseconds(ms). I then added the "Audio (Syntiant)" processing block and the "Classification" Learning block. For a detailed explanation of the Impulse Design for the Syntiant TinyML audio classification, checkout the Edge Impulse documentation.
The next step was to extract Features from the training data. This is done by the Syntiant processing block. On the Parameters page, I used the default Log Mel filterbank energy features and they worked very well. The Feature explorer is one of the most fun options in Edge Impulse. In the feature explorer, all data in your dataset are visualized in one graph. The axes are the output of the signal processing process and they can let you quickly validate whether your data separates nicely. I was satisfied with how my features separated for each class. This enabled me to proceed to the next step, training the model.
Under "Classifier" I set the number of training cycle as 100 with a learning rate of 0.0005. Edge Impulse automatically designs a default Neural Network architecture that works very well without requiring the parameters to be changed. However, if you wish to update some parameters, Data Augmentation can improve your model accuracy. Try adding noise, masking time and frequency bands and inspect your model performance with each setting.
I then clicked “Start training” and waited for a few minutes for the training to be complete. Upon completion of the training process, I got an accuracy of 97.6%, which is pretty good!
When training the model, I used 80% of the data in the dataset. The remaining 20% is used to test the accuracy of the model in classifying unseen data. We need to verify that our model has not overfit by testing it on new data. If your model performs poorly, then it means that it overfit (crammed your dataset). This can be resolved by adding more dataset and/or reconfiguring the processing and learning blocks if needed. Increasing performance tricks can be found in this guide.
On the left bar, we click "Model testing" then "Classify all". The current model has a performance of 97.8% which is pretty good and acceptable.
From the test data, we can see the first sample has a length of 3 seconds. I recorded this in a living room which had a computer playing siren sounds and at the same time a television was playing a movie. In each timestamp of 1 second, we can see that the model was able to predict the ambulance_firetruck class. I took this as an acceptable performance of the model and proceeded to deploy it to the Syntiant TinyML board.
To deploy our model to the Syntiant Board, first click "Deployment" on the left side panel. Here, we will first deploy our model as a firmware for the board. When our audible events: ambulance_firetruck and car_horn are detected, the onboard RGB LED will turn on. When the "unknown" sounds are detected, the onboard RGB LED will be off. This firmware runs locally on the board without requiring internet connectivity and also with minimal power consumption.
Under "Build Firmware" select Syntiant TinyML.
Next, we need to configure posterior parameters. These are used to tune the precision and recall of our Neural Network activations, to minimize False Rejection Rate and False Activation Rate. More information on posterior parameters can be found here Responding to your voice - Syntiant - RC Commands
Under "Configure posterior parameters" click "Find posterior parameters". Check all classes apart from "unknown", and for calibration dataset we use "No calibration (fastest)". After setting the configurations, click "Find parameters".
This will start a new task, so we have to wait until it is finished.
When the job is completed, close the popup window and then click "Build" option to build our firmware. The firmware will be downloaded automatically when the build job completes.
Once the firmware is downloaded, we first need to unzip it. Connect a Syntiant TinyML board to your computer using a USB cable. Next, open the unzipped folder and run the flashing script based on your Operating System.
We can connect to the board's firmware over Serial. To do this, open a terminal, select the COM Port of the Syntiant TinyML board with settings 115200 8-N-1 settings (in Arduino IDE, that is 115200 baud Carriage return). Sounds of ambulance sirens, firetruck sirens, and cars horns will turn the RGB LED red.
For the "unknown" sounds, the RGB LED is off. In configuring the posterior parameters, the detected classes that we selected are the ones which trigger the RGB LED lighting.
After testing the model on the Syntiant TinyML board and finding that it works great, I proceeded to create a demo of the smart wearable of this project.
This involved connecting a vibration motor to GPIO 1 of the Syntiant TinyML board. When the classes "ambulance_firetruck" and "car_horn" are detected, the GPIO 1 on the board is set HIGH and this causes the vibration motor to vibrate for 1500 milliseconds. Vibration motors are mostly used to give haptic feedback in mobile phones and video game controllers. They are the components that make your phone vibrate.
Since we cannot connect a motor directly to the GPIO pins, I used the 5V pad on the Syntiant TinyML board to power the vibration motor through a transistor that is switched by GPIO 1.
In future, we can then package these components safely into a wrist wearable. The Syntiant TinyML board has a 3.7V LiPo battery connector which will enable the wearable to be used anywhere. For this demo, I used the USB connector as the power source for all components.
You can find the Arduino code for this use case in the GitHub repository syntiant-tinyml-firmware-environment-hearing-aider. The repository has the instructions on how to install the required libraries and upload the Arduino code to the Syntiant TinyML board.
The image below shows the annotation of the Syntiant TinyML board. GPIO 1, GND and the 5V pad on the bottom side are used for this smart wearable.
This environmental audio monitor wearable is one of the many solutions that TinyML offers. A future work can be to include other sounds such as motor bikes, detect construction equipment, or alarm sounds, among others.
With Edge Impulse, developing ML models and deploying them has always been easy. The Syntiant TinyML board was chosen to deploy our model because it provides ultra-low power consumption, a fully connected neural network architecture, an onboard microphone, its tiny size, and is also fully supported by Edge Impulse.
A sample project demonstrating how to use the Nordic Thingy:53 and the Edge Impulse App to perform environmental noise classification.
Created By: Attila Tokes
Public Project Link:
Noise pollution can be a significant problem especially in densely populated urban areas. It can have negative effects both humans and the wildlife. Also, noise pollution is often caused by power hungry activities, such as industrial processes, constructions, flights, etc.
A Noise Pollution Monitoring device built on top of the Nordic Thingy:53 development kit, with smart classification capabilities using Edge Impulse can be a good way to monitor this phenomenon in urban areas. Using a set of Noise Pollution Monitoring the noise / environmental pollution from a city can be monitored. Based on the measured data, actions can be taken to improve the situation. Activities causing noise pollution tend to also have a high energy consumption. Replacing this applications with more efficient solutions can reduce their energy footprint they have.
In this project I will demonstrate how a low power device like the Nordic Thingy:53 can be used in conjunction with an edge machine learning platform like Edge Impulse to build a smart noise / environmental pollution monitor. The PDM microphone of the Nordic Thingy:53 will be used to capture environmental noise. A digital signal processing (DSP) and Inference pipeline built using Edge Impulse will be used to classify audio samples of know activities like construction works, traffic and others.
The Nordic Thingy:53 is comes with the pre-installed firmware, that allows us to easily create machine learning projects with Edge Impulse.
To get started with the app we will need to create an Edge Impulse account, and a project:
After this we should be able to detect the Thingy:53 in the Devices tab. The thingy will show up as a device named EdgeImpulse.
Going to the Inference tab we can try out the pre-installed demo app, which uses the accelerometer data to detect 4 types of movement.
The first step of building a machine learning model is to collect some training data.
For this proof-of-concept, I decided to go with 4 classes of sounds:
Silence - a silent room
Nature - sound of birds, rain, etc.
Construction - sounds from a construction site
Traffic - urban traffic sounds
A the source of the audio samples I used a number of Youtube videos, listed in the Resources section.
The audio sample can be collected from the Data tab of the nRF Edge Impulse app:
The audio samples are automatically uploaded to the Edge Impulse Studio project, and should show up in the Data Acquisition tab:
By default all the samples will be put in the Train set. We also need a couple of samples for verification, so we will need to run a Train / Test split:
After this we should have approximately 80% of samples in the Train set, and 20% in the Test set:
Having the audio data, we can start building a machine learning model.
In Edge Impulse project the machine learning pipeline is called an Impulse. An impulse includes the pre-processing, feature extraction and inference steps needed to classify, in our case, audio data.
For this project I went will the following design:
Time Series Data Input - with 1 second windows @ 16kHz
Audio (MFE) Extractor - this is the recommended feature extractor for non-voice audio
NN / Keras Classifier - a neural network classifier
Output with 4 Classes - Silence, Nature, Traffic, Construction
The impulse blocks were trained mostly with the default settings. The feature extraction block looks like follows:
This is followed by the classification block:
The resulting model is surprisingly good:
Most of the test samples were correctly classified. We only have a couple of mismatches for the Traffic / Construction and Silence / Nature classes. This is however expected, as these sounds can be pretty similar.
Building an deploying an embedded application including machine learning used to involve a couple of steps. With the Thingy:53 and Edge Impulse this become much easier.
We just need to go to the Deployment tab, and hit Deploy. The model will automatically start building:
A couple of minutes later the model is built and deployed on our Thingy:53:
The Deployment we did earlier should have been uploaded a firmware with the new model on the Thingy:53. Hitting Start on the Inference will start live classification on the device.
I tested the application out with new audio samples for each class:
In future versions this project could be extended to also include features like:
Noise level / decibel measurement
Cloud connectivity via Bluetooth Mesh / Thread
Solar panel charging
A network of such monitoring devices could be used to monitor the noise / environmental pollution in a city. Based on the collected data high impact / polluting activities can be identified, and can be replaced with better alternatives.
Sound Sources:
Use the Avnet RaSynBoard to listen to the sound of a pump, in order to determine compressor speed and system health.
Created By: David Tischler
Public Project Link:
Most industrial settings such as pumping facilities, heating, ventilation, and air conditioning (HVAC) machinery, datacenter infrastructure rooms, manufacturing and heavy industry sites will have proprietary, expensive, critical machinery to maintain. On-site workers with experience near the machinery or equipment, are generally quick to identify when a machine doesn't seem to "sound right", "feel right", or "look right". Previous experience helps them to understand the "normal” state of the equipment, and this intuition and early warning can allow for scheduled repairs or proper planning for downtime. However, facilities that don't have 24/7 staffing or on-site workers could be more prone to equipment failures, as there is no one to observe and take action on warning signs such as irregular sounds, movements, or visual indicators.
Predictive Maintenance using machine learning aims to solve for this problem, by identifying and acting upon anomalies in machinery health.
We'll use the onboard microphone of the RaSynBoard to create a machine learning model that understands a "low-speed" sound made by a small pump, and a "high-speed" noise from the pump. For the purposes of this demo, we will consider the "high-speed" noise to be a bad sign, indicating there is a problem.
Reasons for a pump to speed up and attempt to push more liquid could vary, depending on the use-case:
Liquid cooling, perhaps temperatures are rising and automation has increased the pump flow to try to extract more heat from the system.
Flow has been restricted, and the pump is trying to achieve the same total volume of throughput but through a smaller diameter.
Liquid levels are rising, and the pump is trying to reduce the volume of water in a tank, lake, or other holding area.
Getting started, we will need to collect audio samples from the pump, running at both low-speed and at high-speed, as well as a variety of “other” background noises that will form an “Unknown” classification, helping us filter out random everyday noises that occur. To collect the data, we’ll first need to prepare the RaSynBoard.
With the board connected to the computer, inside the extracted .zip file, run the flash_linux.sh
or flash_win.bat
file depending upon whether you are on Linux or Windows, and the Edge Impulse firmware will be flashed to the board. When completed, you can disconnect the RaSynBoard from the computer.
Next, insert the SD Card into your laptop or desktop, and copy the config.ini
, mcu_fw_120.synpkg,
dsp_firmware.synpkg, and
ei_model.synpkg` files from the unzipped firmware directory to the SD Card. Upon completion, eject the SD Card from your computer.
Return the Jumper to pins 1-2, and insert the SD Card into the RaSynBoard. Power it back on, and reconnect it to the computer once again.
Now that the RaSynBoard is running Edge Impulse firmware, you can use the edge-impulse-daemon
command (which got installed as part of the CLI installation) on your desktop or laptop to connect the RaSynBoard to Edge Impulse. The board will then show up in the “Devices” page, and over on the Data Acquisition page you can use the drop-down menus to choose your device, choose the Microphone, enter a sample length of 1000 milliseconds (1 second), and provide a label for the type of data you are collecting. Once ready you will simply click “Start sampling”. (The IMU is also available, for projects that might make us of accelerometer data).
With the pump running in what we’ll call our “normal” condition, low-speed, I’ve collected about 50 audio samples with the RaSynBoard placed on top of the pump motor, each 1 second in length. My label is entered as low-speed
, but you can use another term if you prefer. You’ll want to keep the location of the RaSynBoard consistent, so that the data used to train the model is representative of what will be experienced once the model is built and deployed back to the board.
Next, I’ve increased the speed of the pump from 10% to 85%, which still gives some excess pumping capacity, but could be a cause for concern in those use-cases outlined above. The same process is followed, with individual audio samples collected and labeled as high-speed
, about 50 samples should be enough.
Once a dataset has been collected, a machine learning model can be built that identifies or recognizes these same sounds in the field. So in this project, we should be able to identify and classify the sound of our pump running at low-speed, and the sound of our pump running at high-speed.
On the left, click on “Impulse design”, and in the first block, “Time Series Data” should already be selected. Add a Processing block, and choose Audio (Syntiant). Then add a Learning block, and choose Classification. Finally, the Output features should already be selected, thus you can click Save Impulse.
Next, on the left navigation, choose Syntiant, to configure the Processing block. This is where Features will be extracted, and you will see a Spectrogram visualization of the processed features for each data sample (chosen from a drop-down menu at the top-right of the raw data). You should be able to leave the default MFE configuration alone, and simply click on Save parameters at the bottom. However, each option has a help tip, should you need to fine-tune these settings. On the next page, click on Generate features to create a Feature explorer that will plot a graph, allowing you to examine how your Features cluster.
Choose Classifier on the left navigation to proceed to the neural network settings for the Learning block, where you can choose the number of training cycles (epochs), learning rate, and other settings (again the defaults will likely work fine, but help text can guide any fine-tuning you may want to perform). Once you are ready, you can click on Start training to begin the model build process.
Upon completion, you will see another visual representation, this time evaluating the validation performance of your model and the Accuracy against a validation set.
To deploy this newly-built audio classification machine learning model onto the RaSynBoard, there are a few options. The quickest way to test if your model is functional and determine accuracy in the real world, is to simply choose a ready-made firmware binary that can be flashed and run immediately on the RaSynBoard. However, when building a full-featured application intended for production use, you’ll instead want to choose either the Syntiant NDP120 library or Custom Block export options, as you can then leverage the machine learning model in your application code as necessary, depending on your use-case or product.
For now though, I will just use the binary download as an easy way to test out the model.
On the left navigation, click on Deployment, and type “RaSynBoard” in the search box. From the options, choose RaSynBoard and then click on the Find posterior parameters button, which will enable you to choose the keywords or sounds from your labels that you want to detect. Choose “low-speed” and “high-speed”, then click on Find parameters.
Next, click on Build, and the firmware will be generated and downloaded to your computer. Once downloaded, unzip the file, and we’ll follow a similar method as earlier.
Power down the RaSynBoard if it is still running, remove the SD Card from the board and insert the SD Card into your laptop or desktop, and copy the config.ini
, mcu_fw_120.synpkg
, dsp_firmware.synpkg
, and ei_model.synpkg
files from the unzipped download to the SD Card. Upon completion, eject the SD Card from your computer and return it to the RaSynBoard.
We can now power the RaSynBoard back on, the board will boot up and automatically start running the model. To see the results, we need to attach to a serial console and can view the output of any inference results. Using a standard UART adapter, connect Ground, TX, and RX to pins 2,4, and 6 on the I/O Board, as shown here:
Then in a terminal, you will see the output of the model running. I have placed the RaSynBoard back on the pump, set the speed to low, and sure enough, the model is able to predict the pump is running at low-speed
. Increasing the compressor power to 85%, the RaSynBoard now recognizes that the pump is running at high-speed
.
Using machine learning and an embedded development kit, we were able to successfully identify and classify pump behavior by listening to the sound of the compressor. This demonstration validated the approach as feasible, and when wrapped into a larger application and alerting system, an audio classification model could be used for remote infrastructure facilities, factory equipment, or building HVAC equipment that is not continually monitored by workers or other human presence. The Renesas RA6 MCU combined with the Syntaint NDP120 neural decision processor in the Avnet RaSynBoard create a low-power, cost-effective solution for predictive maintenance or intervention as needed, prior to a failure or accident occurring.
End-to-end synthetic data pipeline for the creation of a portable LED product equipped with keyword spotting capabilities. The project serves as a comprehensive guide for development of any KWS produc
Created By: Samuel Alexander
Public Project Link:
GitHub Project Link:
In the era of smart devices and voice-controlled technology, developing effective keyword spotting (KWS) systems is crucial for enhancing user experience and interaction. This documentation provides a comprehensive guide for creating a portable LED product with KWS capabilities, using Edge Impulse's end-to-end synthetic data pipeline. Synthetic voice/speech data, generated artificially rather than collected from real-world recordings, offers a scalable and cost-effective solution for training machine learning models. By leveraging AI text-to-speech technologies, we can create diverse and high-quality datasets tailored specifically for our KWS applications. This guide not only serves as a blueprint for building a responsive LED product but also lays the groundwork for a wide range of voice-activated devices, such as cameras that start recording on command, alarms that snooze with a keyword, or garage doors that respond to voice prompts.
Traditional methods of training keyword spotting models often rely on extensive datasets of human speech, which can be time-consuming and expensive to collect. Moreover, ensuring diversity and representation in these datasets can be challenging, leading to models that may not perform well across different accents, languages, and speaking environments. Synthetic data addresses these challenges by providing a controlled and flexible means of generating speech data. Using AI text-to-speech technology, we can produce vast amounts of speech data with varied voices, tones, and inflections, all tailored to the specific keywords we want our models to detect.
This approach opens up numerous possibilities for product development. For instance, a smart LED light can be designed to turn on or off in response to specific voice commands, enhancing convenience and accessibility. A camera can be programmed to start recording or take a group photo when a designated keyword is spoken, making it easier to capture moments without physical interaction. Similarly, an alarm system can be configured to snooze with a simple voice command, streamlining the user experience. By utilizing synthetic data, developers can create robust and versatile KWS models that power these innovative applications, ultimately leading to more intuitive and responsive smart devices.
This project outlines the creation of a keyword spotting (KWS) product using Edge Impulse's synthetic data pipeline. The process involves generating synthetic voice data with OpenAI's Whisper text-to-speech model via Edge Impulse Studio and training the KWS model using Syntiant's audio processing blocks for the NDP120 on the Arduino Nicla Voice. The phrase 'illuminate' and 'extinguish' will be generated and used for training the model.
After training, the model is deployed onto the Arduino Nicla Voice hardware. A custom PCB and casing are designed to incorporate LED lights and power circuitry, ensuring portability and ease of use. This guide serves as a practical resource for developers looking to implement KWS functionality in voice-activated devices, demonstrating the efficiency of synthetic data in creating responsive and versatile products.
The Arduino Nicla Voice is an ideal choice for this project due to its use of the Syntiant NDP120, which offers great power efficiency for always-on listening. This efficiency allows the NDP120 to continuously monitor for keywords while consuming minimal power, making it perfect for battery-powered applications. Upon detecting a keyword, the NDP120 can notify the secondary microcontroller, Nordic Semiconductor nRF52832, which can then be programmed to control the lighting system. The compact size of the Nicla Voice also makes it easy to integrate into a small case with a battery. Furthermore, the Nicla Voice's standardized footprint simplifies the prototyping process, allowing for the easy creation of a custom PCB module with LED circuitry that can be easily connected.
Arduino Nicla Voice (or other Edge Impulse supported MCU with mic)
PCB and SMD components (parts breakdown explained later)
Edge Impulse CLI
Arduino IDE
OpenAI API account
Once you have your secret key, you can navigate to your Edge Impulse organization page and enter your API key. Please also note that this Text to Speech Data Generation feature is only available for Edge Impulse Enterprise accounts.
Now that we have the environment configured, and our OpenAI API saved in the Edge Impulse Studio, we are ready to start a new project and begin generating some synthetic voice data.
On your project's page select Data acquisition --> Data sources --> + Add new data source --> Transformation block --> Whisper Synthetic Voice Generator --> Fill out the details as follow:
Phrase: illuminate
We need to generate speech for the words "illuminate" and "extinguish", we can start with illuminate for this 'action' first and then set up another data source 'action' for extinguish after we finish configuring this one.
Label: illuminate
Match the label name to the voice sample we want to generate.
Number of samples: 10
We can start with generating 10 samples to test creating this action, if everything works feel free to generate an action that generates more than 10 samples. Once created you can also run the action multiple times.
Voice: random
We want to create diversity of voice and accent in our dataset, so choose random.
Model: tts-1
I tested tts-1 and tts-1-hd, I think the quality difference is negligible for this case, but feel free to select either one. Note that tts-1-hd will cost you more OpenAI credits.
Speed: 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2
We want to vary the speed of the voice pronouncing the word we want to generate. 0.6 means 60% of its original speed, and 0.6 - 1.2 should give enough range.
Now you can run the action. If successful, the tts voice generation should begin and it may take a few minutes to complete. If the job failed you should be notified and you can recheck if the API key is entered correctly, then you can retry again.
Once satisfied with all the data generated, perform a Train / Test split into approximately 80/20 ratio.
The Impulse design values are chosen for optimal keyword spotting performance. The 968 ms window size captures enough audio for accurate detection, while the 500 ms window increase balances responsiveness and efficiency. The 16000 Hz frequency is standard for capturing human voice, ensuring quality without excessive data. Using the Audio (Syntiant) block leverages the NDP120's capabilities for efficient digital signal processing. The Classification block distinguishes between commands, with output classes "extinguish," "illuminate," and "z_openset" allowing for control of the lighting system and handling unknown inputs.
Window size: 968 ms
Window increase: 500 ms
Frequency: 16000 Hz
Audio (Syntiant)
Classification
Output: extinguish, illuminate, z_openset
Under Classifier, set the learning rate to 0.0005 and change the architecture preset to use Dense Network.
Our audio classifier gives in accuracy of 93.8% which is satisfactory. We can continue tuning the hyperparameters and try using some data augmentation, but for the purpose of this demonstration we are satisfied with the current result and can move to the deployment phase.
Now the AI model is ready to be deployed to the Arduino Nicla Voice. Let's select the Arduino Nicla Voice deployment.
Please note that we want to flash the firmware that you have just built, instead of the default audio firmware for the Nicla Voice's NDP120.
Once the code is uploaded you can verify if everything works by saying out loud the words 'Illuminate' and 'Extinguish'. When the keyword 'Illuminate' is detected, the blue built-in led will blink and when the keyword 'Extinguish' is detected, the red built-in led will blink.
Next we will design and manufacture the PCB socket which holds the LED and power circuitry which can turn on when 'Illuminate' is detected and turn off when 'Extinguish' is detected.
The schematic and PCB is designed using KiCAD. A single sided aluminum PCB is selected for this project due to its excellent thermal conductivity, which helps dissipate heat generated by the LEDs and other components, ensuring reliable performance and longevity. The design of this PCB is simple enough to make it possible to route using one side only.
The schematic, pcb, and gerber (manufacturing) files are accessible in the project's github page.
Synthetic data has demonstrated its value in the development of voice-activated products like Lumo Voice. Its customizable nature allows for the inclusion of diverse accents, languages, tones, and inflections, resulting in robust keyword spotting models that perform well across different speaking styles. Unlike traditional data collection, which requires gathering numerous samples from various individuals, synthetic data offers a more efficient and scalable approach, enabling the rapid generation of high-quality datasets tailored to specific needs. This flexibility, combined with Edge Impulse's incredible streamlined workflow, has made it easier than ever to build and deploy small, efficient models on edge devices. With Edge Impulse, we could quickly generate synthetic data, train, and optimize our models, making it a powerful tool for creating responsive and versatile voice-controlled devices like Lumo Voice.
Take an existing audio classification model built for the Thunderboard Sense 2, and prepare it for use on the SiLabs xG24 board.
Created By: Pratyush Mallick
Public Project:
This project focuses on how to port an existing audio recognition project built with a SiLabs Thunderboard Sense 2, to the latest as used in the newer SiLabs xG24 Dev Kit. For demonstration purposes, we will be porting project, which is an Edge Impulse based TinyML model to predict various vehicle failures like faulty drive shaft and brake-pad noises. Check out his work for more information.
The audio sensor on the Thunderboard Sense 2 and the xG24 Dev Kit are the same (TDK InvenSense ICS-43434), so ideally we're not required to collect any new data using the xG24 Dev Kit for the model to work properly. Had the audio sensor been a different model, it would most likely be necessary to capture a new dataset.
However, note has to be taken that the xG24 has two microphones, phones placed at the edges of the board.
In this project, I am going to walk you through how you can clone Mani's Public Edge Impulse project for the Thunderboard Sense 2 board, build it for the xG24, test it out, and then deploy to the newer SiLabs xG24 device instead.
Before you proceed further, there are few software packages you need to install.
Click on the "Clone" button at top-right corner of the page.
That will bring you to the below popup tab. Enter a name for your clone project, and click on the "Clone project" button.
This action will copy all the collected data, generated features, and model parameters into your own Edge Impulse Studio. You can verify this by looking at the project name you entered earlier.
Now if you navigate to "Create impulse" from the left menu, you will see how the model was created originally.
As you can see, the model was created based on audio data sampled at 16KhZ. As mentioned, because the audio microphones used on the both board boards are the same, we're not required to collect any additional data from the new board.
However, if you do want to collect some data from the xG24, then you will need to flash the base firmware and then use the edge-impulse-daemon
to connect the device to the Studio.
You can follow the guide below to go through the process, if you are interested in adding more data samples to your cloned project:
With default value of Window Size (10s) and Window Increase (500 ms), the processing block will throw an error, as represented below:
This is because some of the features in Edge Impulse's processing block have been updated since this project was created, so you need to update some of the parameters in the Timer Series block such as Window Size and Window Increase, or increase the frame stride parameter in the MFE processing block. This is what my updated window parameters look like:
Next, navigate to the "Classification" tab from the left menu, and click on "Start training".
Alternatively, you can also collect more data as mentioned above, or add new recognized sounds with other audio classes, then begin your training.
When you are done training, navigate to the "Live Classification" page from the left menu. This feature of Edge Impulse comes in handy when migrating projects to different boards.
Rather than deploying the model and then testing it on the hardware, with this feature we can actually collect audio data from the hardware immediately, and run the model in the Studio on the collected data. This saves time and effort before hand.
For Edge Impulse supported boards we can directly download the base Edge Impulse firmware, and then directly record audio (or other) data from the target device.
Once done, you can select the device name, select the sensor as "Microphone", sample length and the sampling frequency (ideally equally to collected samples).
When you are done retraining, navigate to the "Deployment" tab from the left menu, select "SiLabs xG24 Dev Kit" under "Build firmware", then click on the "Build" button at the bottom of the page.
This will build your model and download a .zip file containing a .hex
file and instructions.
With the Thunderboard Sense 2 deploying firmware could be done by directly dragging and dropping files to the "USB Driver TB004" when the device was connected in flash mode to a host PC. However, for the xG24 we have to use Simplicity Commander to upload the firmware to the board. You need to first connect the xG24 board the PC, make note of the COM port for the board, and ideally it will be identified by the PC as a J-Link UART port.
Now open the Simplicity Commander tool and connect the board. Once connected, select the "Flash" option on the left and then select the downloaded .hex
file and flash it to the board.
To start the inferencing run the following command in your terminal:
Note that this is a newer command supported by the Edge Impulse CLI, hence you may need to update your edge-impulse-cli
version to get this running and avoid a package mismatch as shown below:
Now your model should be running, and recognize the same audio data and perform inferencing on the newer xG24 Dev Kit hardware, with little to no modifications to actual data or to the model architecture.
This highlights the platform agnostic nature of Edge Impulse, and was possible in this case because the audio sensor on both the Thunderboard and xG24 are the same. However, you would need do your own due diligence for migrating projects built with other sensor data such as humidity/temperature, or the light sensor, as those do vary between the boards.
One final note is that in this project, the xG24 is roughly 2x as fast as the Thunderboard Sense 2 in running the DSP, and 8x faster in running the inference:
Hopefully this makes upgrading your SiLabs projects easier!
Perform local keyword spotting to control a relay and turn devices on or off, with voice commands and an Arduino Nicla Voice.
Created By: Jallson Suryo
Public Project Link:
GitHub Repo:
Can you imagine Amazon Alexa without the cloud?
For most services, adding voice means adding an internet connection and it means extra expense, privacy and security concerns, and you need to install an app (or access to the web) for everything in your home. Another problem is the time delay between a voice command being given, the command being sent to a cloud server and then back to the device for execution, creating a poor user experience.
A power-plug which can be controlled using voice commands, with no connection to the internet. By using a machine learning model embedded in a microcontroller that has been trained to recognize several commands, which are then passed to relays which will turn the power on or off at each socket according to the issued command, instantly. Practicality, privacy concerns, and cost-effectiveness are the goals of this Non-IoT Voice Controlled Power Plug project.
This project takes advantage of Edge Impulse's Syntiant audio processing block that extracts time and frequency features from a signal, specific to the Syntiant NDP120 accelerator included in the Nicla Voice. The NDP120 is ideal for always-on, low-power speech recognition applications with the “find posterior parameters” feature that will only react to the specified keywords.
Devices with an embedded ML model will accept voice commands, but won't need a WIFI or Bluetooth connection. All processing is done locally on the device, so you can directly tell a lamp, air conitioner, or TV to turn on or off without Alexa or Siri, or any digital assistant speaker/hub.
This project will use relays and a power strip connected to various appliances such as a lamp, fan, TV, etc. An Arduino Nicla Voice with embedded ML model has been trained to recognize various keywords like: one
, two
, three
, four
, on
, and off
is the center of the decision process. From the Nicla Voice we use the I2C protocol which is connected to an Arduino Pro Micro to carry out voice commands from the Nicla Voice, and forwarded to the relays which control power sockets.
Arduino Nicla Voice (with Syntiant NDP120)
Any microcontroller or Arduino (I use Pro Micro)
5V Relay (4pcs)
Breadboard
Cable jumper
Cable for 110/220V
Powerstrip (4 sockets)
Edge Impulse Studio (Enterprise account or Free Trial for more than 4 hours of training data)
Arduino IDE
Terminal
Before we start, we need to install the Arduino CLI and Edge Impulse tooling on our computer.
Click on a data sample that was collected, then click on the 3 dots to open the menu, and finally choose Split sample. Set the segment length to 1000 ms (1 second), or add segments manually, then click Split. Repeat this process until all samples are labeled in 1 second intervals. Make sure the comparison between one, two, three, four, on, off and unknown data is quite balanced, and the ratio between Training and Test data is around 80/20.
Choose Create Impulse, set Window size to 968ms, then add an Audio (Syntiant) Processing block, and choose Classifier for the Learning block, then Save Impulse. In the Syntiant parameters, choose log-bin (NDP120/200) then click Save. Set the training to around 50 cycles with 0.0005 Learning rate, and choose Dense Network with Dropout rate around 0.5, then click Start training. It will take a short while, but you can see the progress of the training on the right. If the results show a figure of around 80% accuracy upon completion, then we can most likely proceed to the next stage.
Now we can test the model in Live classification, or choose Model testing to test with the data that was set aside earlier (the 80/20 split), and click Classify all. If the result is quite good -- again around 80% accuracy, then we can move to the next step -- Deployment.
Because there are two MCU's in this solution, two seperate applications are needed:
Finally we succeeded in making this Non IoT Voice Controlled Power-plug idea a reality and implemented in a home appliances setting. I believe in the future this kind of non-IoT smart home system will be widely implemented, and could be built-in to every home appliance with specific keywords. Concerns about privacy, security, as well as practicality and energy efficiency can be achieved for a more sustainable future.
Building an audio classification wearable that can differentiate between the sound of a dog bark, howl, and environmental noise, trained entirely on synthetic data from ElevenLabs.
Created By: Solomon Githu
Public Project Link:
GitHub Repository:
It's said that a dog is man's best friend and it is no secret that dogs are incredibly loyal animals. They are very effective when it comes to securing premises, and are also able to sense when things are not right, whether with a person or with a situation. Some examples of dog security are guidance for people with visual impairments, detection of explosives and drugs, search and rescue missions, and enhancing security at various places. Worker safety aims to foster a practice of ensuring a safe working environment by providing safe equipment and implementing safety guidelines that enable workers to be productive and efficient in their job. In this case, dogs are usually deployed to patrol and monitor areas around workplace. One of the reasons is because dogs have an extraordinary sensing ability of smell, vision and hearing; making them exceptional at detecting threats that may go unnoticed by humans or other security systems. However, workers may not always be able to interpret a dog's barks in time. The workers may not be knowledgeable of how dog's react, or they may be focusing on their tasks and fail to hear a dog. Failure to detect a dog's bark may lead to fatalities, injuries or even accidents.
Machine listening refers to the ability of computers to understand audio signals similarly to how humans hear and understand various sounds. Recently, labeling of acoustic events has emerged as an active topic covering a wide range of applications. This is because by analyzing animal sounds, AI can identify species more accurately and efficiently than ever before and provide unique insights into the behaviors and habitats of animals without disturbing them. Barking and other dog vocalizations have acoustic properties related to their emotions, physiological reactions, attitudes, or some other internal states. Historically, humans have relied on intuition and experience to interpret these signals from dogs. We have learned that a low growl often precedes aggression, while a high-pitched bark might indicate excitement or distress. Through this experience, we can train AI models to recognize dog sounds, and those who work with the animals— like security guards, maintenance staff, and even delivery people can use that insight.
The AI model only requires to be trained to recognize the sounds one seeks to monitor based on recordings of the sound. However, creating an audio dataset of animal sounds is quite challenging. In this case, we do not disturb dogs, or other animals, to provoke reactions like barking. Fortunately, Generative AI is currently at the forefront of AI technology. Over the past decade, we have witnessed significant advancements in synthetic audio generation. From sounds to songs, with just a simple prompt, we can now use computers to generate dog sounds and in turn use the data to train another AI model.
This project aims to develop a smart prototype wearable that can be used by workers to receive alerts from security dogs. In workplaces and even residential areas, dog sounds are common, but we often overlook them, assuming there is no real threat. We hear the sounds but don't actively listen to the warnings dogs may be giving. Additionally, workers at a site may be too far to hear the dogs, and in some cases, protective ear muffs further block out environmental sounds.
Sound classification is one of the most widely used applications of Machine Learning. This project involves developing a smart wearable device that is able to detect dogs sounds specifically barking and howling. When these dogs sounds are detected, the wearable communicates about the dog's state by displaying a message on a screen. This wearable can be useful to workers by alerting them of precautionary measures. A security worker may be alerted of a potential threat that a dog identified but they did not manage to see. A postal delivery person can also be alerted of an aggressive dog that may be intending to attack them as they may perceive the delivery person as a threat.
To train a Machine Learning model for this task, the project uses generative AI for synthetic data creation. The reason why I chose this is because we cannot distress a dog so that we can obtain reactions like barking or howling. I also wanted to explore how generative AI can be used for synthetic data generation. Ideally, when training Machine Learning models, we want the data to be a good representation of how it would also look when the model is deployed (inference).
Canine security refers to the use of trained security dogs and expert dog handlers to detect and protect against threats. The effectiveness in dogs lies in their unique abilities. Animals, especially dogs, have a keen sense of smell and excellent hearing. As a result, dogs are the ideal animal to assist security guards in their duties and also provide security to workplaces and homesteads. However, at the same time, according to the American Veterinary Medical Association, more than 4.5 million people are bitten by dogs each year in the US. And while anyone can suffer a dog bite, delivery people are especially vulnerable. Statistics released by the US Postal Service show that 5,800 of its employees were attacked by dogs in the U.S. in 2020.
According to Sam Basso, a professional dog trainer, clients frequently admit they have more to learn about their dogs during his sessions. While humans have been able to understand how dogs act, there is still more learning that the average person requires so that they can better understand dogs. There are professional dog handlers that can be used to train owners but this comes at a great cost and also not everyone is ready to take the classes. To address these issues, we can utilize AI to develop a device that can detect specific dog sounds such as barking, and alert workers so that they can follow up on the situation that the dog is experiencing. In the case of delivery persons, an alert can inform them of a nearby aggressive dog.
Audio classification is a fascinating field with numerous applications, from speech recognition to sound event detection. Training AI models has become easier by using pre-trained networks. The transfer learning approach uses a pretrained model which is already trained using a large amount of data. This approach can significantly reduce the amount of labeled data required for training, it also reduces the training time and resources, and improves the efficiency of the learning process, especially in cases where there is limited data available.
Finally, for deployment, this project requires a low-cost, small and powerful device that can run optimized Machine Learning models. The wearable will also require ability to connect to an OLED display using general-purpose input/output (GPIO) pins. Power management is another most important consideration for a wearable. The ability to easily connect a small battery, acheive low power consumption, and have battery charging would be great. In this case, the deployment mode makes use of the XIAO ESP32S3 development board owing to it's small form factor, high performance and lithium battery charge management capability.
Software components:
Edge Impulse Studio account
ElevenLabs account
Arduino IDE
Hardware components:
A personal computer
SSD1306 OLED display
3.7V lithium battery. In my case, I used a 500mAh battery.
Some jumper wires and male header pins
Soldering iron and soldering wire
Super glue. Always be careful when handling glues!
Once we have an account on both ElevenLabs and EdgeImpulse, we can get started with data creation. First, create a project (with Professional or Enterprise account) on Edge Impulse Studio. On the dashboard, navigate to "Data acquisition" and then "Synthetic data". Here, we will need to fill the form with our ElevenLabs API key and also parameters for the data generation such as the prompt, label, number of samples to be generated, length of the each sample, frequency of the generated audio files, and also prompt influence parameter.
To get an API key from ElevenLabs, first login to your account. Next, on the "home" page that opens after logging in, click "My Account" followed by "API Keys". This will open a new panel that enables managing the account API Keys. We need to click "Create API Key" and then give a name to the key, although the naming does not matter. Next, we click the "Create" button and this will generate an API key (a string of characters) that we need to copy to our Edge Impulse project in the "ElevenLabs.io API Key" field.
For this demonstration project, at first I worked with 3 classes of sounds: dog barking, dog howling and environment sounds (with city streets, construction sites and people talking). In this case, the prompt that I used to generate sound for each class was "dog howling", "dog barking" and "environmental sounds (e.g., city streets, construction sites, people talking)" respectively. The labels for each class was dog_howling
, dog_barking
and environment
respectively. For each prompt, I used a prompt influence of 0.6 (this generated the best sounds), "Number of samples" as 6, "Minimum length (seconds)" as 1, "Frequency (Hz)" as 16000 and "Upload to category" as training. With these configurations, when we click the "Generate data" button on Edge Impulse Studio, this will generate 6 audio samples each of 1 second for one class. To generate sound for another class, we can simply put the prompt for it and leave the other fields unchanged. I used this configuration to generate around 39 minutes of audio files consisting of dogs barking, dogs howling and environment (e.g., city streets, construction sites, people talking) sounds.
In Machine Learning, it is always a challenge to train models effectively. Bias can be introduced by various factors and it can be very difficult to even identify that this problem exists. In our case, since the device will be continuously recording environment sound and classifying it, we need to also consider that it's not always that there will be a dog barking, dog howling, people talking and city sounds present. The environment can also be calm, with low noise or other sounds that the generative AI model failed to include in the environment class. Identifying this is key to fixing the bias of using the environment sounds (e.g., city streets, construction sites, people talking).
Finally, after using ElevenLabs integration and uploading my noise
sound recording, I had around 36 minutes of sound data for both training and testing. In AI, the more data, the better the model will perform. For this demonstration project, I found the dataset size to be adequate.
We then click the button "Perform train / test split" on the interface that opens. This will open another interface that asks us if we are sure of rebalancing our dataset. We need to click the button "Yes perform train / test split" and finally enter "perform split" in the next window as prompted, followed by clicking the button "Perform train / test split".
By using the Autotune feature on Edge Impulse, the platform updated the processing block to use frame length of 0.025, frame stride of 0.01, filter number of 41, set the Lowest band edge of Mel filters (in Hz) to 80 and set the Noise floor (dB) as -91.
The last step is to train our model. We click "Classifier" which is our learning block that is using a Convolution Neural Network (CNN) model. After various experiments, I settled with 100 training cycles, a learning rate of 0.0005 and a 2D Convolution architecture. Finally, we can click the "Save & train" button to train our first model. After the training process was complete, the model had an accuracy of 98% and a loss of 0.04.
When training our model, we used 80% of the data in our dataset. The remaining 20% is used to test the accuracy of the model in classifying unseen data. We need to verify that our model has not overfit, by testing it on new data. To test our model, we first click "Model testing" then "Classify all". Our current model has an accuracy of 99%, which is pretty good!
At last, we have a simple Machine Learning model that can detect dog sounds! However, how do we know if this configuration is the most effective? We can experiment with three other processing blocks: Spectrogram, raw data processing and MFCC. To be specific, the difference between the Impulses in this project is the processing block. To add another Impulse, we click the current Impulse (Impulse #1) followed by "Create new impulse".
In this project, we now have four Impulses. The Experiments feature not only allows us to setup different Machine Learning processes, but it also allows us to deploy any Impulse. The MFE, Spectrogram and MFCC Impulses seem to perform well according to the model training and testing. I decided to skip deploying the Raw data Impulse since using raw data as the model input does not seem to yield good performance in this use case.
We can then deploy the second Impulse which uses the Spectrogram pre-processing algorithm. The steps for the deployment are similar - we select Arduino library, enable EON compiler, select Quantized (int8) and download the Arduino library. To speed up compilation and use cached files in the Arduino IDE, we can simply unzip the second Impulse Arduino library and copy over the model-parameters
and tflite-model
folders to the first Impulse's Arduino library folder, overwriting the existing files with the updated model parameters. Unfortunately, the model is not able to run on the ESP32S3 board and we get an error failed to allocate tensor arena
. This error means that we have run out of RAM on the ESP32S3.
Lastly, I experimented with deploying the MFCC Impulse. This algorithm works best for speech recognition but the model training and testing show that it performs well for detecting dog sounds. Following similar steps, I deployed the fourth Impulse using the EON Compiler and Quantized (int8) model optimizations. Surprisingly, this Impulse (using the MFCC processing algorithm) delivers the best performance even compared to the MFE pre-processing block. The Digital Signal Processing (DSP) takes approximately 285ms, with classification taking about 15ms. For detecting dog sounds, this Impulse accurately identifies with great confidence, demonstrating the positive impact of a DSP block on model performance!
Based on the experiments, I chose to continue with the fourth Impulse due to its accuracy and reduced latency.
A solid gadget needs a solid case! We are close, so its time to put our wearable together.
The other 3D printed components are two flexible wrist straps. These are similar to the ones found on watches. I achieved the flexibility by printing them with TPU material. Note that if you do not have a good 3D printer you may need to widen the strap's holes after printing. I used super glue to attach the wrist straps to the case. Always be careful when handling glues!
A cool part is the wearable's dock/stand. This component is not important to the wearable's functionality, but a device's dock/stand is just always cool! It keeps your device in place, adds style to your space, and saves you from the fear of your device being tangled in cables.
The wearable's electronic components include:
Seeed Studio XIAO ESP32S3 (Sense) development board with the camera detached
SSD1306 OLED display
3.7V lithium battery. In my case, I used a 500mAh battery.
Some jumper wires and male header pins
The XIAO ESP32S3 board has a LiPo battery connector copper pads that we can use to solder wires for the battery connection. Note that the negative terminal of the power supply is the copper pad closest to the USB port, and the positive terminal of the power supply is the copper pad further away from the USB port.
Once the electronic parts have been assembled, they can be put in the wearable's case according to the layout in the image below. Side vents on the case allow the onboard digital microphone to capture surrounding sounds effectively and they also help cool the ESP32S3.
Below is an image of my wearable after assembling the components.
To test the wearable, I used a television to play YouTube videos of construction sites and dog sounds. At first, I was expecting the model to not perform well since the YouTube playback sounds were not the same as the sounds that were generated by ElevenLabs. In Machine Learning, we target to train the model on a dataset that is the same representation of what it will be seeing during deployment. However, the model, and using the MFCC algorithm, performed well and it was able to accurately detect dog sounds, though sometimes barks were classified as howls and vice versa.
Let’s now put on our safety hats, or get packages to deliver, and put the TinyML wearable to test.
This low cost and low-power environmental sensing wearable is one of the many solutions that embedded AI has to offer. The presence of security dogs provides a sense of security and an unmatched source of environment state feedback to us humans. However, there is a great need to also understand how these intelligent animals operate so that we can understand and treat them better. The task at hand was very complicated, to capture sounds without causing disturbance to dogs, train a dog sound detection model, and optimize the model to run on a microcontroller. However, by utilizing the upcoming technologies of synthetic data generation and powerful tools offered by the Edge Impulse platform; we have managed to train and deploy a custom Machine Learning model that can help workers.
The mobile app is used to interact with Thingy:53. The app also integrates with the embedded machine learning platform.
Edge Impulse Project:
Nordic Thingy:53:
Edge Impulse Documentation:
Construction:
Nature:
Traffic:
*Image credit on Unsplash.
To demonstrate the concept, we'll use the , which is a low-cost, small device that contains a , and a to accelerate machine learning inferencing. The RaSynBoard is available to , and makes prototyping audio classification, accelerometer and motion detection projects simple.
The RaSynBoard comes with default firmware from Avnet out-of-the-box, so that it is ready to use and quick to get started with machine learning exploration. But we'll need to add Edge Impulse firmware to the board instead, in order to interface with the Edge Impulse Studio or API. To flash the board with the necessary firmware, you will need two Renesas flashing applications installed, as well as the Edge Impulse CLI. Links to the needed software, as well as the firmware that needs to be flashed, are located in the documentation here: . Download and install each of the three applications, and download the firmware .zip file, and unzip it.
Once you have the Renesas bits installed, and the firmware downloaded and extracted on your laptop or desktop, you can flash the RaSynBoard by removing the SD Card, removing the Jumper from pins 1-2, and then connecting a USB-C cable from your computer to the USB-C port on the I/O Board. This is shown in detail in the excellent .
To create an OpenAI API secret key, start by visiting the . If you don't have an account, sign up; otherwise, log in. Once logged in, navigate to the API section by clicking on your profile icon or the navigation menu and selecting "API" or "API Keys." In the API section, click on "Create New Key" or a similar button to generate a new API key. You may be prompted to name your API key for easy identification. After naming it, generate the key and it will be displayed to you. Copy the key immediately and store it securely, as it might not be visible again once you navigate away from the page.
You can now use this API key in your applications to authenticate and access OpenAI services, for this project we will use the API key for generating synthetic voice data via Edge Impulse's transformation blocks. Ensure you keep your API key secret and do not expose it in client-side code or public repositories. You can manage your keys (regenerate, delete, or rename) in the API section of your OpenAI account. For more detailed instructions or troubleshooting, refer to the or the help section on the OpenAI website.
After building our model, we'll get the new firmware. Follow this guide for more detail on how to flash the audio firmware:
Once we have flashed the firmware, we can upload the Arduino code using the Arduino IDE. You can find the code on my GitHub repository here:
Edge Impulse CLI - Follow to install the necessary tooling to interact with the Edge Impulse Studio and also run inference on the board.
Simplicity Studio 5 - Follow to install the IDE
Simplicity Commander - Follow to install the software. This will be required to flash firmware to the xG24 board.
If you don't have an Edge Impulse account, signup for free and log into . Then visit the below to get started.
.
If you added some new data and are not sure of the model design, then the can come to the rescue. You just have to select the target device as SiLabs EFR32MG24 (Cortex-M33 78MHz) and configure your desired parameters, then the Eon tuner will come up with suggested architectures which you can use.
You can refer to the previously mentioned official Docs link to get the latest firmware and connect the xG24 to the Edge Impulse Studio:
Alternatively, you can use to collect data, if you don't want to install any tools.
You can follow this guide to get everything installed.
Open in a browser, and sign in, or create a new account if you do not have one. Click on New project, then in Data acquisition, click on the Upload Data icon for uploading .wav files (e.g. from Kaggle, Google Speech Commands Dataset, etc.). Other methods to collect data are from devices such as a connected smartphone with QR code link, or a connected Nicla Voice with Edge Impulse audio firmware flashed to it. For ease of labelling, when collecting or uploading data, fill in the name according to the desired label, for example one
, two
, three
, on
, off
, or zzz
for words or sound that can be ignored.
Note: With over 4 hours of audio data, multiple classes and higher performance settings to build the model, this project uses an for more capable and faster results.
For a Syntiant NDP device like the Nicla Voice, we can configure the (in this case tick all labels except zzz
). To run your Impulse locally on the Arduino Nicla Voice, you should select the Nicla Voice in the Deployment tab, then click Build. The binary firmware will start building and automatically download to your computer once it is complete, and a video with instructions on how to flash the firmware will pop-up. as instructed. Once complete, Now you can run this model in a Terminal for live classification.
Upload this code to the Arduino Nicla Voice using the Arduino IDE. This code will override the existing application code in the Nicla Voice's MCU, but not the machine learning model on the NDP120. The code basically sends a byte via I2C every time a keyword is detected, the value of the byte will depend on the keyword detected.
Upload this code to the Pro Micro using the Arduino IDE. This application receives the incoming byte via I2C and will switch on or off the relay based on the values of the data received from the Nicla Voice.
With the recent advancements in embedded systems and the Internet of Things (IoT), there is a growing potential to integrate Machine Learning models on resource constrained devices. In our case, we want a light-weight device that we can easily wear on our wrists and at the same time achieving smart acoustic sensing. Steve Roddy, former Vice President of Product Marketing for Arm's Machine Learning Group once said that "TinyML deployments are powering a huge growth in ML deployment, greatly accelerating the use of ML in all manner of devices and making those devices better, smarter, and more responsive to human interaction". Tiny Machine Learning (TinyML) enables running Machine Learning on small, low cost, low-power resource constrained devices like wearables. Many people have not heard of TinyML but we are using it everyday on devices such as smart home assistants. According to an , there are already 3 billion devices that are able to run Machine Learning models.
We will use TinyML to deploy a sound classification model on the . This tiny 21mm x 17.8mm development board integrates a camera sensor, digital microphone, SD card, 8MB PSRAM and 8MB Flash. By combining embedded Machine Learning computing power, this development board can be a great tool to get started with intelligent voice and vision AI solutions. We will use the onboard digital microphone to capture environment sounds and an optimized Machine Learning model will run on the ESP32-S3R8 Xtensa LX7 dual-core processor. The TinyML model will classify sound as either noise, dog bark or dog howling. These classification results will then be displayed on an OLED screen. The XIAO ESP32S3 board was a good fit for this project due to it's high performance processor, wireless communication capabilities, and low power consumption.
As the embedded hardware is advancing, software developments are also coming up and they are enabling TinyML. We will use the for this project and indeed this is leading Edge AI platform! I chose Edge Impulse because it simplifies the development and deployment platform. The platform supports integrating generative AI tools for synthetic data acquisition, ability to simultaneously have various machine learning pipelines each with it's deployment performance shown, and the ability to optimize Machine Learning models-enabling them to run even on microcontrollers with less flash and RAM storage. The experience using the Edge Impulse platform for this project made the workflow easy and it also enabled the deployment since the model optimization enabled 27% less RAM and 42% less flash (ROM) usage. This documentation will cover everything from preparing the dataset, training the model and deploying it to the XIAO ESP32S3!
You can find the public Edge Impulse project here: . To add this project into your Edge Impulse account, click "Clone this project" at the top of the page. Next, go to the section "Deploying the Impulses to XIAO ESP32S3" for steps on deploying the model to the XIAO ESP32S3 development board.
Training a model requires setting up various configurations, such as data processing formats, model type, and training parameters. As developers, we experiment with different configurations and track their performance in terms of processing time, accuracy, classification speed, Flash and RAM usage. To facilitate this process, Edge Impulse offers the feature. This enables us to create multiple Machine Learning pipelines (Impulses) and easily view the performance metrics for all pipelines, helping us quickly understand how each configuration performs and identify the best one.
development board with the camera detached
. Available to download on Printables.com
To collect the data to be used in this project, we will use the Synthetic data generation tool on the platform. At the time of writing this documentation in October 2024, Edge Impulse has integrated three Generative AI platforms for synthetic data generation: Dall-E to generate images, Whisper for creating human speech elements, and ElevenLabs to generate audio sound effects. In our project, we will use since it is great for generating non-voice audio samples. There is an amazing that demonstrates how to use the integrated ElevenLabs audio generation feature with Edge Impulse. If we were instead capturing sounds from the environment, Edge Impulse also supports collecting data from such as uploading files, using APIs, smartphone/computers, and even connecting development boards directly to your project so that you can fetch data from sensors.
The first step was to create a free account on ElevenLabs. You can do this by signing up with an email address and a password. However, note that with the current the free account gives 10,000 credits which can be used to generate around 10 minutes of audio per month. Edge Impulse's synthetic audio generation feature is offered in the Professional and Enterprise packages, but users can access the Enterprise package with a that doesn't require a credit card.
In generative AI, prompts act as inputs to the AI. These inputs are used to prompt the generative AI model to generate the desired response which can be text, images, video, sound, code and more. The goal of the prompt is to give the AI model enough information so that it can generate a response that is relevant to the prompt. For example, if we want ChatGPT to generate an invitation message we can simply ask it to "Generate an invitation message". However, if we were to add more details such as the time, venue, what kind of event is it (wedding, birthday, conference, workshop etc.), targeted audience, speakers; these can improve the quality of the response we get from ChatGPT. ElevenLabs have created a and it also describes other parameters that they have enabled so that users can get more relevant responses.
However, later on after experimenting with various models, I noticed significant bias in the dog barking class, leading the models to classify any unheard sounds as dog barks (in other words, the models were overpredicting the dog bark class). In this case, I created another class, noise
, consisting of 10 minute recordings from quiet environments with conversations, silence, and low machine sounds like a refrigerator and a fan. I uploaded the recordings to the Edge Impulse project and used the to extract 1 second audio samples from the recording. After several experiments, I observed that the model actually performed best when I only had 3 classes: dog barking, dog howling and noise sounds. Therefore, I disabled the environment
class audio files in the dataset and this class was ignored in the pre-processing, model training and deployment.
Finally, once we have the dataset prepared, we need to it for Training and Testing. The popular rule is 80/20 split and this indicates that 80% of the dataset is used for model training purposes while 20% is used for model testing. On Edge Impulse Studio projects, we can click red triangle with exclamation mark (as shown in the image below) and this will open an interface that suggests splitting our dataset.
After collecting data for our project, we can now train a Machine Learning model for the required sound classification task. To do this, on Edge Impulse we need to create an . An Impulse is a configuration that defines the input data type, data pre-processing algorithm and the Machine Learning model training. In our project, we are targeting to train an efficient sound classification model and "fit" inside a microcontroller (ESP32S3). In this case, there are a great number of parameters and algorithms that we need to choose accurately. One of the great features of the Edge Impulse platform is the powerful tools that simplify the development and deployment of Machine Learning. Edge Impulse recently released the feature which allows projects to contain multiple Impulses, where each Impulse can contains either the same combination of blocks or a different combination. This allows us to view the performance for various types of learning and processing blocks, while using the same input training and testing datasets.
First, for the model input I used a window size of 1 second, window increase size of 500ms (milliseconds), frequency of 16,000Hz and enabled the "Zero-pad data" field so that samples less than 1 second will be filled with zeros. Since we are targeting deployment to a resource constrained device, one way of reducing the size of data being processed is by reducing the length of audio samples being taken. Next, we need to define the audio pre-processing method. This operation is important since it extracts the meaningful features from the raw audio files. These features are then used as inputs for the Machine Learning model. The preprocessing includes steps such as converting the audio files into a spectrogram, normalizing the audio, removing noise, and feature extraction. There are various pre-processing algorithms that are used for sound data, such as , , , working with , and more. Choosing the best pre-processing algorithm in sound classification is essential because the quality and relevance of input features directly impact the model's ability to learn and classify sounds accurately. The learning block will be the same for this sound classification project but you can experiment with other pre-processing algorithms to identify the best performing.
On our Edge Impulse project, we create the first Impulse. In this case, I first used as the processing block and Classification as the learning block. Similarly to the , the Audio MFE processing extracts time and frequency features from a signal. However this algorithm uses a non-linear scale in the frequency domain, called Mel-scale. It performs well on audio data, mostly for non-voice recognition use cases when sounds to be classified can be distinguished by human ear. After saving this configuration, we click the "Save Impulse" button.
Next, we need to configure the processing block, MFE. Click "MFE" and we are presented with various parameters that we can set such as frame length, frame stride, filter number, FFT length, low frequency, high frequency and Noise floor (dB) for normalization. Selecting the appropriate parameters for configuring the digital signal processing (DSP) can be a troubling and time-consuming task, even for experienced digital signal processing engineers. To simplify this process, Edge Impulse supports of the processing parameters. To ensure that I get the best pre-processing, I used this feature by clicking the "Autotune parameters" button. in this setup, we can reduce the inference time by changing the FFT length value to say 256.
After configuring the processing parameters, we can generate features from the dataset. Still on the MFE page, we click the "Generate features" tab and finally the "Generate features" button. The features generation process will take some time depending on the size of the data. When this process is finished, the will plot the features. Note that features are the output of the processing block, and not the raw data itself. In our case, we can see that there is a good seperation of the classes and this indicates that simpler Machine Learning (ML) models can be used with greater accuracy.
This will create a new Impulse instance. The steps to configure this Impulse are the same with the only difference being that we will select Spectrogram as the processing block. The processing block extracts time and frequency features from a signal. It performs well on audio data for non-voice recognition use cases, or on any sensor data with continuous frequencies. Once this Impulse has been saved we will again use the Autotuning feature, generate features and train a neural network with the same configurations as the first Impulse. After this process was completed the features generated with Spectrogram were not as well separated as compared to MFE used in the first Impulse, specifically features for the dog barking and howling sounds. The model training accuracy was 98% and the loss was 0.15. Finally, after testing the model on unseen data, the performance was also impressive with 98% accuracy.
Next, I experimented with using the raw audio files as inputs to the model. The block generates windows from data samples without any specific signal processing. It is great for signals that have already been pre-processed and if you just need to feed your data into the Neural Network block.The steps to configure this Impulse are the same as the first two with the only difference being that we will select Raw data as the processing block and we will use dense layers for the neural network architecture. After this process was completed, on the Feature explorer we can see that the audio data are not separated as compared to the first two processing blocks. The model training accuracy was 33% and the loss was 12.48. After testing the model on unseen data, the performance was also poor with an accuracy of 39%.
Finally, I experimented with using MFCC processing. The processing block extracts coefficients from an audio signal. Similarly to the Audio MFE block, it uses a non-linear scale called Mel-scale. It is the reference block for speech recognition but I also wanted to experiment it on non-human voice use cases. The steps to configure this Impulse are the same as the first three with the only difference being that we will select Audio (MFCC) as the processing block. After this process was completed, on the Feature explorer we can see that the audio data are separated but not well as compared to MFE and Spectrogram pre-processing. The training accuracy was 97% and the loss was 0.06. Testing the model on unseen data, the performance was again impressive with an accuracy of 96%.
Edge Impulse how to use the XIAO ESP32S3. We will deploy an Impulse as an - a single package containing the signal processing blocks, configuration and learning blocks. You can include this package (Arduino libray) in your own sketches to run the Impulse locally on microcontrollers.
To deploy the first Impulse to the XIAO ESP32S3 board, first we ensure that it is the current Impulse and then click "Deployment". In the field "Search deployment options" we need to select Arduino library. Since memory and CPU clock rate are limited for our deployment, we can optimize the model so that it can utilize the available resources on the ESP32S3 (or simply, so that it can fit and manage to run on the ESP32S3). often has a trade-off whereby we decide whether to trade model accuracy for improved performance, or reduce the model's memory (RAM) usage. Edge Impulse has made model optimization very easy with just a click. Currently we can get two optimizations: EON compiler (gives the same accuracy but uses 27% less RAM and 42% less ROM) and TensorFlow Lite. The is a powerful tool, included in Edge Impulse, that compiles machine learning models into highly efficient and hardware-optimized C++ source code. It supports a wide variety of neural networks trained in TensorFlow or PyTorch - and a large selection of classical ML models trained in scikit-learn, LightGBM or XGBoost. The EON Compiler also runs far more models than other inferencing engines, while saving up to 65% of RAM usage. TensorFlow Lite (TFLite) is an open-source machine learning framework that optimizes models for performance and efficiency, making them to be able to run on resource constrained devices. To enable model optimizations, I selected the EON Compiler and Quantized (int8).
Next, we need to add the downloaded .zip library to Arduino IDE and utilize the esp32_microphone
example code. The deployment steps are also documented on the XIAO ESP32S3 . Once we open the esp32_microphone
sketch, we need to change the I2S library, update the microphone functions, and enable the ESP NN accelerator as described by MJRoBot (Marcelo Rovai) in . You can also obtain the complete updated code in this . Before uploading the code, we can follow the to download the ESP32 board on the Arduino IDE and then select the XIAO ESP32S3 board for uploading. With the XIAO ESP32S3 board still connected to the computer, we can open the Serial Monitor and see the inference results. We can see that the Digital Signal Processing (DSP) takes around 475ms (milliseconds) and the model takes around 90ms to classify the sound - which is very impressive. However, when I played YouTube videos of dog sound infront of the XIAO ESP32S3, like , the model did not correctly classify dog barks and we can see most of the confidence was on noise. Although this appears to be an issue, it may actually stem from the difference in sound quality between training and inference - the test using synthetic data performed well but deployment performance was not the same. In this case, the sounds captured during inference have noise, the volume of dog sounds is different, and overall the recordings are not clear as compared to the dataset samples.
The wearable's components can be categorized into two parts: the electronic components and the 3D printed components. The 3D printed component files can be downloaded from . The wearable has a casing which is made up of two components: one holds the electrical components while the other is a cover. I 3D printed the housing and cover with PLA material.
The next task is to solder female jumper wires to the XIAO ESP32S3 I2C and power pins. These wires will then be connected to the SSD1306 OLED display. I chose to solder the wires directly to the board instead of using jumper wires on the board's pins since this will make the design more compact and reduce the height of the wearable. The pin list of the XIAO ESP32S3 board can be found .
After assembling the wearable, we can connect to the XIAO ESP32S3 board using the USB-C slot on the case to program it, and charge the LiPo battery! We can get the inference code from this GitHub repository . This Arduino sketch loads the model and runs continuous inference while printing the results via Serial. After a successful run of the inference code, I updated the code and added further processing of the inference results to display images on the OLED according to the predicted class with the highest confidence. This updated code can also be found in the GitHub repository: .
At last, our dog sound detection wearable is ready. We have successfully trained, tested, optimized, and deployed a Machine Learning model on the XIAO ESP32S3 Sense board. Once the wearable is powered on, the ESP32S3 board continuously samples sound of 1 second and predicts if it has heard dog sounds or noise. Note that since there is a latency of around 300 milliseconds (285ms for Digital Signal Processing and 15ms for classification) between the sampling and inference results. Some sounds may not be captured in time since other processes of the will be executed. In this case, to achieve a smaller latency, we can target another hardware, such as the which features an always-on sensor and speech recognition pocessor, the .
The new Experiments feature of Edge Impulse is a powerful tool and it comes in very handy in the Machine Learning development cycle. There are numerous configurations that we can utilize to make the model more accurate and reduce hardware utilization on edge devices. In my experiments, I tried other configuration combinations and chose to present the best and worst performing ones in this documentation. Are you tired of trying out various Impulse configurations and deployment experimentations? Well, Edge Impulse offers yet another powerful tool, the . This tool helps you find and select the best embedded machine learning model for your application within the constraints of your target device. The EON Tuner analyzes your input data, potential signal processing blocks, and neural network architectures - and gives you an overview of possible model architectures that will fit your chosen device's latency and memory requirements. First, make sure you have data in your Edge Impulse project. Next, select the "Experiments" tab and finally the "EON Tuner" tab. On the page, configure your target device and your application budget, and then click the "New run" button.
You can find the public Edge Impulse project here: . This includes the deployed Edge Impulse library together with inference code and OLED usage functions. A future work on this project would be to include other alert features such as sending SMS messages or including a vibration motor such that the wearable can vibrate when dog sounds are detected. This vibration can then be felt by the user, in case of headphone or earplug usage in certain situations or environments.
C176224
QR1206F5R10P05Z
Ever Ohms Tech
1206
250mW Thick Film Resistors 200V ą1% ą400ppm/? 5.1? 1206 Chip Resistor - Surface Mount ROHS
50
0.0156
0.78
C516126
HL-AM-2835H421W-S1-08-HR5(R9) (2800K-3100K)(SDCM<6,R9>50)
HONGLITRONIC(Hongli Zhihui (HONGLITRONIC))
SMD2835
60mA 3000K Foggy yellow lens -40?~+85? Positive Stick White 120° 306mW 3.4V SMD2835 LED Indication - Discrete ROHS
50
0.0144
0.72
C2589
IRLML2502TRPBF
Infineon Technologies
SOT-23
20V 4.2A 1.25W 45m?@4.5V,4.2A 1.2V@250uA 1 N-Channel SOT-23 MOSFETs ROHS
5
0.1838
0.92
C5440143
CS3225X7R476K160NRL
Samwha Capacitor
1210
16V 47uF X7R ą10% 1210 Multilayer Ceramic Capacitors MLCC - SMD/SMT ROHS
5
0.0765
0.38
C153338
FCR1206J100RP05Z
Ever Ohms Tech
1206
250mW Safety Resistor 200V ą5% 100? 1206 Chip Resistor - Surface Mount ROHS
10
0.0541
0.54
Collect audio data for your machine learning model on a Raspberry Pi Pico.
Created By: Alex Wulff
Public Project Link: https://studio.edgeimpulse.com/public/117150/latest
Keyword spotting is an important use case for embedded machine learning. You can build a voice-activated system using nothing but a simple microcontroller! Since this use case is so important, Edge Impulse has great documentation on best practices for keyword spotting.
Edge Impulse also now supports the Raspberry Pi Pico. This is fantastic because, at $4, Pico is a very capable and low-cost platform. And now, with machine learning, it can power tons of projects. The only problem is that Edge Impulse's firmware for Pico only supports direct data collection at a low sample rate. Audio waveforms vary relatively quickly—therefore, we need some way of collecting data at a rapid sample rate and getting it into Edge Impulse. That's the purpose of this project!
This project is not just useful for audio; any Pico project that needs a higher sample rate can use this same code.
Most machine learning applications benefit from using training data collected from the system on which you will perform inferencing. In the case of embedded keyword spotting, things are no different. Subtle variations between device microphones and noise signatures can render a model trained on one system useless on another. Therefore, we want to collect audio data for training using the very circuit that we'll use to perform inferencing.
The Raspberry Pi Pico can actually collect data at an extremely high sample rate. In my testing, you can achieve sample rates of up to 500 kHz. Pretty neat! For audio, of course, a much lower sample rate is sufficient. Lossless audio is sampled at up to 44 kHz (twice the maximum frequency that human ears can hear thanks to the Nyquist sampling theorem), but for keyword spotting we can get away with something as low as 4 kHz. Lower sample rates are generally better because they decrease the computational burden on our inferencing platform; if we go too low, we start to sacrifice audio fidelity and spotting keywords becomes impossible.
So — Pico is collecting 4,000 samples every second. How do we actually save these off into a useful format? If we had an SD card hooked up to Pico we could simply write the data to that. I want this project to be entirely self-contained, however, so that idea is out. Pico does have 2 MB of onboard flash, but this would fill up relatively quickly and it's possible to have some messy side effects like overwriting your program code.
This leaves us with one option — the serial port. We can dump data over the serial port where it can be read and saved off by the host computer. In my testing, we can only push out a few kB per second of data this way, but it is sufficient for audio data. There's also the added complication that most convenient serial interfaces are text-based. We can't simply send raw bytes over serial, as these may be interpreted as control characters such as line returns that won't actually be saved off. Instead, we can encode the data into a text format such as base-64 and then decode it later.
Another program can convert the raw bytes into a usable audio file once the data is saved off on the computer.
All you need for this project is a microphone for data collection (and a Pico, of course)! Check out the circuit above for how to hook the microphone up to Pico. I recommend this microphone from Adafruit. It has automatic gain control, which means that as you get further from the microphone the module will automatically turn up the gain so you don't get too much quieter. This is essential for a project where you might be a varying distance away from the microphone.
You can find all the code for this project here. It compiles using the standard Pico CMake
and make
procedure. If you're not sure how this process works, look through Raspberry Pi's tutorials for getting set up with the Pico C++ SDK and building/flashing your code.
The main code is in pico_daq.cpp
. The program starts the Pico's ADC sampling routine, normalizes the data and converts it to floats, converts the float data to base-64, and then prints this out over the serial console to the host computer.
This is where we encounter our first problem — at 4 kHz, with four bytes per floating point value, the serial port cannot write data as fast as we collect it. The way I handle this problem in this code is by dropping chunks of samples. While we're busy sending the rest of the data that could not be sent during the sampling window, the ADC is not collecting data. This leads to some jumps within the final audio file, but for the purposes of collecting data for keyword spotting it is sufficient.
Compile this code and flash it to your Pico.
On Linux and macOS systems, saving serial data is relatively easy with the screen
utility. You can install screen
using your package manager in Linux, or using homebrew on macOS. Most Windows serial console clients also give you some means of saving off data from the serial console, but I won't provide instructions for these.
Before using screen
, you'll first need to identify the device name of your Pico. On macOS, this will be something like /dev/tty.usbmodem....
. On Linux, this will be something like /dev/ttyACM...
. You can use lsusb
and dmesg
to help you figure out what device handle your Pico is.
With the code running on your Pico, the following command will open the Serial port and save off the data (make sure you replace /dev/...
with your Pico's device handle): screen -L /dev/tty.usbmodem1301
Go ahead and start speaking one of your keywords for a period of time. Base-64 characters should constantly flash across your screen. Follow best practices listed in the Edge Impulse tutorial for creating datasets for keyword spotting as you collect your data.
To exit screen
when you're done collecting data, type Control-A
and then press k
. You will now have the raw base-64 data saved in a file called screenlog.0
.
You can rename this file to something that indicates what it actually contains (“{keyword}.raw” is a good choice), and keep collecting others.
Edge Impulse doesn't know how to understand the data in base-64 format. What we need is a way to convert this base-64 data into a .wav file that you can import directly into Edge Impulse using its data uploader. I made a Python 3 program to do just that, which you can find here. This code requires the scipy
, numpy
, and pydub
packages, which you can install via pip
.
Replace infile
with the path to the base-64 data output by screen
, and outfile
with the desired output path of your .wav file. You should be ready to run the Python program!
Once the program is done, you'll have a finished audio file. You can play this with any media player. Give it a try — if all goes well, you should be able to hear your voice!
You can now follow the rest of Edge Impulse's tutorial for training your model. Everything else, including all the tips, applies! Make sure you have enough training data, have a balanced dataset, use enough classes, etc.
For instructions on how to deploy a keyword spotting model on Pico, check out my project that uses Pico as the brains of a voice-activated lighting controller. In that project I use this code to collect data, and then show how to use the deployed model to control your projects!
Build a voice-activated LED light strip controller on the cheap with a Raspberry Pi Pico.
Created By: Alex Wulff
Public Project Link: https://studio.edgeimpulse.com/public/117150/latest
LED light strips, sometimes called “Neopixels” in the maker community, are the perfect choice for interior lighting. They’re easy to install, inexpensive, and are completely programmable. Simply connect the light strips to a microcontroller, write some code to make whatever lighting pattern you want, and you’re good to go!
Unfortunately for me, however, I’m lazy. I have no desire to get up and change the pattern if I get bored or turn it off if I want to go to bed. I could buy some Wi-Fi connected light strip on Amazon, but these are expensive and fiddling with some app on my phone is not peak laziness. Some light strips come with remotes, but I know that I’d lose the remote within days of having it.
What if we could use the $4 Raspberry Pi Pico to control lighting strips with voice commands? With Edge Impulse, we can! With just the Pico, a cheap microphone, and a power supply, you can make a voice-controlled lighting system.
The Pico is actually an excellent microcontroller for machine learning projects. First, it’s extremely inexpensive, so I don’t feel bad about leaving them inside a project instead of cannibalizing it when I’m done. Second, the Pico is an incredibly capable machine. It runs at a base frequency of 125 MHz so it can execute most machine learning pipelines in only a few hundred milliseconds, and it features a whole host of other peripherals like PIO and a second core. Third, writing code for Pico is an extremely pleasant experience. Pico supports normal C++, CMake, and Make, so development is very familiar. Actually programming the board itself is also as simple as possible.
Perhaps most important of all is that Pico is extremely well documented. It’s difficult to overstate how nice it is to have a well-document board; this saves countless hours of troubleshooting doing tasks as simple as setting up your development environment to experimenting with advanced features like PIO.
I firmly believe that the Raspberry Pi Pico offers the most bang for your buck out of all the Microcontrollers currently on the market. Additionally, with its new $6 Wi-Fi-equipped cousin, I can envision Pico becoming a formidable contender in a crowded field of microcontrollers.
You’ll only need a few components to make this project work:
Raspberry Pi Pico
LED Light Strip
5V Power Supply
Breadboard
Most LED strips labelled as “addressable” or “Neopixel” will work. The main thing you’re looking for is a strip based on the WS2812 driver. You can find these on Amazon, Adafruit, Sparkfun, etc. in varying sizes and configurations.
Also ensure that you select a power supply that can deliver a suitable amount of current to your project. 60 LEDs can draw up to 2A at full brightness. This power supply should do the trick for most projects.
Assembling this project should take less than 5 minutes on a breadboard. One important thing to note is that you should not tie the microphone to the 5V output from the power supply—instead, make sure it’s connected to the Pico’s 3V regulator. With the lights attached on, the 5V power supply is very noisy.
The Edge Impulse Studio only supports collecting data from Pico at a relatively low sample rate, so we need to use some custom code to collect training data. See this tutorial for information on how to collect data from the Pico’s AGC, upload it to Edge Impulse, and train a machine learning model on it.
You’ll need two keywords for this project. I used “start” and “stop”. When the lights are off, “start” will turn them on. When the lights are on and you say “start” again, the lights will cycle to a new lighting mode. Saying “stop” when the lights are on will turn them off. It took me a few tries to get my model working well, so don’t get discouraged if your system doesn’t work well at first!
The code for this project really pushes the Raspberry Pi Pico to its limits. I use a variety of Pico-specific features, including direct memory access (DMA), PIO state machines, and multicore processing. First, we’ll start with the things you need to modify to make your project work. Start by cloning the awulff-pico-playground
repository, and navigate to the pico-light-voice
folder. This is the main folder for this project.
Check the indices of the keywords used in your model, and make sure those match with what the if (ix == ...)
part of the code in main.cpp
. Your “start” and “stop” keywords should match with what’s in the code.
Adjust thresh
in main.cpp
to control the sensitivity of the keyword detection.
Change NUM_LIGHTS
in lights.cpp
to the number of LEDs in your strip.
Now let’s take a quick tour through the code so you can learn how to modify it to fit your needs.
The base of this code is built on top of Edge Impulse’s standalone inferencing starting point. You can follow the instructions here for setting up your development environment and downloading the Pico SDK. Additionally, once you’re finished with your voice model, copy over the folders from your deployed C++ library into the main project folder as described in this project’s README.
Let’s start in the main
code loop. You can find this in source/main.cpp
.
Input Parameters
At the top of the file, you can find parameters for the model inputs and sample rates. A CLOCK_DIV
of 12,000 gives a sample rate of 4 kHz. This is low for audio, but still intelligible to a machine learning model. This value needs to match the sample rate that you used to collect data in the data collection tutorial! We trained the model to respond to sample lengths of 4,000 samples (NSAMP
), which works out to be exactly 1 second of data.
For more information on ADC sampling and DMA with Pico, check out my tutorial on this subject.
In this code I also do something called continuous inferencing, which can improve the performance of the keyword detection drastically by passing in overlapping buffers of samples into the model. All the math takes under 250ms to execute, so I maintain a sliding buffer of data and run the model on it every quarter of a second. Edge Impulse doesn’t support continuous inferencing natively on Pico, so I implemented a form of it in software.
Setup Code
All the code in main()
before the while (true)
portion gets executed once. This is all setup code used to configure the model and sampling.
Main Code Loop
We take advantage of Pico’s many hardware features to form a very efficient data processing pipeline, which you can find in the while (true)
portion of the code. The basic operation of this is as follows:
The code starts the ADC sampling routine on Pico. This tells the ADC to start running, and fill up the memory location we passed to the function with samples. All of this happens in the background and does not hang the main loop, so while we’re sampling we can also be running the machine learning code.
As soon as the ADC setup is finished, we move samples from the raw sample buffer into the feature buffer for the model in a form the model expects (floats normalized to between 2^15 and -2^15). The very first time this function runs, the buffer will be filled with garbage (since the ADC hasn’t populated it yet), but every subsequent time this runs it will be filled with actual audio data.
Next we begin executing the machine learning routine on the populated feature buffer.
Based on the output from the model, we send a state update to the other core that handles the lighting. One keyword is used to turn the lights on and cycle through different lighting modes, and the other keyword will turn the lights off.
We now block and wait for the ADC sampling to finish: dma_channel_wait_for_finish_blocking(dma_chan)
As soon as sampling is done we move samples into an intermediate buffer. We want to minimize the amount of time here spent not sampling, so we don’t do any processing here and just leave that for while the ADC is busy.
We then loop and do it all again!
The execution time of your model is very important for this application. As the code is configured, we run the model four times a second. The ADC is set to collect 1000 samples per shot (NSAMP
); every time this collection is done, we shift samples out of the ADC buffer and into an intermediate buffer with past audio data.
The green LED on the Pico is configured to be on while the Pico is busy executing the ML code and off while the Pico is waiting for the ADC. Thus, you can use the duty cycle of the flashing to get a handle on how close your model is to the 250 ms inferencing limit. You are losing data if the light stays on continuously—if this is the case, try reducing the execution time of your model, or increase NSAMP
so the model runs more often.
We execute code to control the lighting on Pico’s second core while the first handles the sampling and machine learning. You can find the code for this in source/lights.cpp
.
The second core executes a simple state machine based on input from the first core. Let’s look at the core1_entry()
function to see how this works.
Setup Code
The first part of this code sets up the light strip. The Adafruit_NeoPixel
library API should be the same as that which you find in other tutorials online, with some caveats. See the library’s GitHub page for more information.
Lighting Loop
The lights operate in a relatively simple manner. If we get the off keyword, the lights will stay off. When we get the “on” keyword, the lights will turn on in the previous lighting mode active when the lights were turned off. When we get the "on" keyword when the lights are already on, the code will cycle to a new lighting state. Add new lighting modes as you see fit, and make sure to update NUM_STATES
to reflect how many states you’d like to use.
If your lighting code is relatively computationally intensive, make sure you periodically check for state updates (like I do in the rainbow mode) to ensure that your lighting system is responsive.
Multicore Communications
The update_state()
function handles the communication between the two cores. Pico implements this communication using two FIFO queues — we can use this as a bi-directional pipe to send information back and forth between the cores. From the lighting core we tell the sampling core that we’re ready for data using multicore_fifo_push_blocking(0)
. If the sampling core sees that the lighting core is ready for a state update, and it has a state update to give, it will send this update to the lighting core. Once the lighting core receives an update it will change the state accordingly.
This code initially looks a little complicated, but it’s a relatively simple way to synchronize two independent processing cores. Read the Raspberry Pi documentation on FIFO and multicore processing if you’re stuck.
You should be able to build this code using standard CMake tools. Follow the instructions here to compile the project using CMake
and make
, and flash it using the .uf2
file. If your code is failing to build, you might not have copied all the folders you needed from your deployed C++ Edge Impulse model, or the path to the Pico SDK might not be set correctly.
Machine learning can be very tough. Any deviation from an environment similar to your training data can cause your model to stop working entirely. Here is a small selection of the many mistakes I made along the way while making this project:
Using a different sample rate in my training data vs. the actual data collected by this code
Accidentally passing in buffers of uint16_t
s to the model when it was expecting floats
Consuming too much RAM on the Pico which causes it to silently break things instead of erroring out (keep NSAMP/INSIZE
small!)
Not rate-limiting the FIFO buffers and causing them to fill up with stale state data
Switching the index of the keywords used in my model
Chances are, you’ll encounter some issues too. Spend some time to understand the code and you’ll have a much easier time debugging.
That’s all! I hope you enjoyed this tutorial. If you found it helpful, check out my writing, my website, and some of my other projects.
Use TinyML to listen for the sound of trucks amongst forest noise, using a Nordic Thingy:53.
Created By: Zalmotek
Public Project Link:
https://studio.edgeimpulse.com/studio/138770
Illegal logging is a major environmental issue worldwide. It has been estimated that it accounts for up to 30% of the global timber trade, and is responsible for the loss of billions of dollars worth of valuable timber each year. When timber is exploited illegally, governments lose much-needed money, particularly in developing countries. In addition to this, illegal logging severely impacts biodiversity and it can lead to soil erosion, decreased water quality, and habitat loss for wildlife. Furthermore, illegal logging is frequently associated with organized crime groups and can serve as a source of funding for rebel or terrorist groups.
Due to the vastness of forested regions, it is difficult to identify unauthorised logging activities, which frequently occur in isolated and difficult-to-reach locations and traditional approaches, such as ground patrols, are frequently ineffective.
One way to combat this problem is through the use of AI algorithms that can be deployed on battery-powered devices, such as sensors near the forest on roads frequented by the trucks transporting the wood. Machine learning algorithms are well suited for this task as they can be trained to recognize the characteristic sounds made by logging trucks. When deployed on the roads near forests, these sensors can provide a real-time alert when a logging activity is detected, allowing law enforcement to quickly respond.
One challenge posed by this approach is that sensors must be able to distinguish between different types of logging truck noises and the background noise in the forest. Another challenge is that the devices must be ruggedized to withstand the harsh environment of the forest. We will address both of these challenges by using the Nordic Thingy:53, a multi-sensor prototyping platform for wireless IoT and embedded machine learning, which will be used to train a ML algorithm, and is encased in a tough polymer casing that can withstand drops and impact.
Our approach to this problem is to create an IoT system based on the Nordic Thingy:53 platform that will run a machine learning model trained using the Edge Impulse platform that can detect the sound of timber trucks.
The Nordic Thingy:53 is a versatile, low-power device that is well suited for this application. Its two Arm Cortex-M33 processors' computing capability and memory capacity allow it to execute embedded machine learning (ML) models directly on the device. It features a microphone for audio input in addition to several other integrated sensors, such as an accelerometer, gyroscope, and magnetometer, as well as sensors for temperature, humidity, air quality, and light level. The Thingy can be powered by a rechargeable Li-Po battery with a 1350 mAh capacity that can be charged via USB-C, making it ideal for use in remote locations.
USB-C cable
Edge Impulse account
Edge Impulse CLI
Nordic nRF Edge Impulse App
Our choice of Edge computing hardware for this use case is the Nordic Thingy:53, is based on Nordic Semiconductor’s flagship dual-core wireless SoC, the nRF5340. The SoC's Arm Cortex-M33 CPU application core assures that the Thingy:53 can handle heavy computational workloads of embedded machine learning without interfering with the wireless communication. The application core is clocked at 128 MHz for maximum speed, with 1 MB of flash storage and 512 KB RAM to fit your programs. Wireless communication is handled independently by another Arm Cortex-M33 core clocked at 64 MHz for more power efficient operation and without using any computing resources from the application core. The Bluetooth Low Energy (LE) radio provides firmware updates and communication through Bluetooth LE, as well as additional protocols such as Bluetooth mesh, Thread, Zigbee, and proprietary 2.4 GHz protocols.
Let's start by creating an Edge Impulse project. Select Developer as your project type, click Create a new project, and give it a suggestive name.
New Thingy:53 devices will function with the Nordic nRF Edge Impulse iPhone and Android apps, as well as with the Edge Impulse Studio right out of the box.
Before connecting it to the Edge Impulse project, the firmware of the Thingy:53 must be updated. Download the nRF Programmer mobile application and launch it. You will be prompted with a number of available samples.
Then, go to Devices -> Connect a new device in your Edge Impulse project, choose Use Your Computer, and allow access to your microphone.
Select the Edge Impulse application, select the version of the sample from the drop-down menu and tap Download.
When that is done, tap Install. A list with the nearby devices will appear and you must select your development board from the list. Once that is done, the upload process will begin.
With the firmware updated, connect the Thingy:53 board to a computer that has the edge-impulse-cli suite installed, turn it on, launch a terminal and run:
You will be required to provide your username and password before choosing the project to which you want to attach the device.
Once you select the project and the connection is successful, the board will show up in the Devices tab of your project.
We will use a publicly available truck noise dataset and a forest sound dataset, as well as the Edge Impulse platform to train and deploy a model that can distinguish between the two types of sounds. In order to upload the sound dataset to Edge Impulse, we’ll have to split it into smaller samples (in our case the samples are 3 seconds long), and you can do so using the following command line instruction:
Make sure to replace DaytimeForest_NatureAmbience.wav with the name of your file.
Now go to Data acquisition > Upload data on Edge Impulse and upload your samples, making sure to label them accordingly. Our two labels are EngineSounds and Background. The difference between the two classes should be clearly observed in the sound waveform, as seen in the following pictures:
Now that the data is available, it’s time to create the Impulse. The functional Block of the Edge Impulse ecosystem is called "Impulse" and it fundamentally describes a collection of blocks through which data flows, starting from the ingestion phase and up to outputting the features.
The setup is rather straightforward for this use case. We will be using a 2000ms window size, with a window increase of 200ms at an acquisition frequency of 100Hz. For the processing block we will be using an Audio (MFE) block and for the Learning block, we will be employing a basic Classification (Keras) block.
The Audio MFE (Mel-filterbank energy) processing block extracts signal time and frequency information. A mel filter bank can be used to break down an audio signal into discrete frequency bands on the mel frequency scale, simulating the nonlinear human perception of sound. It works effectively with audio data, primarily for non-voice recognition applications when the sounds to be categorised may be recognized by the human ear. You can read more about how this block works here.
A spectrogram is a display of the MFE's output for a sample of audio on the right side of the page. The MFE block converts an audio window into a data table, with each row representing a frequency range and each column representing a time period. The value contained within each cell reflects the amplitude of its related frequency range during that time period. The spectrogram depicts each cell as a colored block, with the intensity varying according to the amplitude.
A spectrogram's patterns reveal information about the sort of sound it represents. In our case, the spectrogram below depicts a pattern characteristic of forest background noise:
This spectrogram depicts a pattern characteristic of logging trucks engine sounds and the differences between this spectrogram and the above one can be easily observed:
You can use the default values for configuring the MFE block as they work well for a wide range of applications. Click on Save parameters and you’ll be prompted to the feature generation page. After you click on Generate features you’ll be able to visualise them in the Feature explorer. Generally, if the features are well separated into clusters, it means the ML model will be able to easily distinguish between the classes.
The next step in developing our machine learning algorithm is configuring the NN classifier block. There are various parameters that can be changed: the Number of training cycles, the Learning rate, the Validation set size and to enable the Auto-balance dataset function. They allow you to control the number of epochs to train the NN on, how fast it learns and the percent of samples from the training dataset used for validation. Underneath, the architecture of the NN is described. For the moment, leave everything as is and press Start training.
The training will be assigned to a cluster and when the process ends, the training performance tab will be displayed. Here, you can evaluate the Accuracy and the Loss of the model, the right and wrong responses provided by the model after it was fed the previously acquired data set, in a tabulated form.
Moreover, you can see the Data explorer that offers an intuitive representation of the classification and underneath it, the predicted on-device performance of the NN.
To quickly test the performance of your NN, navigate to the Model testing tab, and click on Classify all. This will evaluate how well the model will perform on never seen data. This is a great practice to avoid overfitting the model on the training data.
You will notice that another menu pops up that allows you to opt in if you want to enable EON Compiler. For now, click Build and wait for the process to end. Once it’s done, download the .hex file and follow the steps in the video that shows up to upload it on the Thingy:53 board.
With the impulse uploaded, connect the board to your computer, launch a terminal and issue the following command to see the results of the inferencing:
Another way of deploying the model on the edge is using the Nordic nRF Edge Impulse App for iPhone or Android:
Download and install the app for your Android/IoS device.
Launch it and log in with your edgeimpulse.com credentials.
Select your Illegal Logging Detection project from the list
Now deploy your device in an area that you want to monitor and receive the notifications of passing trucks on your phone. In our next section we will explore mesh network capabilities and connectivity options.
The Nordic Thingy:53 is equipped with a Dual-core Bluetooth 5.3 SoC supporting Bluetooth Low Energy, Bluetooth mesh, NFC, Thread and Zigbee, which makes it a great choice for creating edge applications that use bluetooth communication as an output. In this case, Edge Impulse platform allows its users to deploy their Impulse as a library containing all the signal processing blocks, learning blocks, configurations, and SDK required to integrate the ML model in your own unique application.
In order to deploy the model as a sensor in the forest, a mesh network can be used to establish connections between various sensors, called nodes. Bluetooth mesh networks are well suited for applications that require a large coverage area. The data collected by the sensors can be transmitted wirelessly to a central location, from which an alert can be sent. Having a Bluetooth mesh network in place is more efficient than having to physically retrieve the sensor data. Furthermore, this network topology provides redundancy and resistance to failure as all nodes are interconnected and any node can act as a relay if necessary. Consequently, using a Bluetooth mesh network is an efficient way to wirelessly collect sensor data over a large coverage area.
Though it is often overlooked, illegal logging is a significant global problem. It results in the loss of valuable timber each year, and contributes to deforestation and climate change. Fortunately, machine learning algorithms offer a promising solution to this problem. By providing real-time monitoring, these algorithms have the potential to significantly reduce the amount of valuable timber lost each year to illegal logging, and the Nordic Thingy:53 is a powerful tool to achieve this. With this system in place, we can help to preserve our forests and ensure that they are managed in a sustainable way.
If you need assistance in deploying your own solutions or more information about the tutorial above please reach out to us!
Using captured audio and the SiLabs xG24 to determine if a room is occupied or empty.
Created By: Zalmotek
Public Project Link: https://studio.edgeimpulse.com/public/101280/latest
Occupancy is an important issue in Building Management Systems because based on sensory data you can automatically control lights or temperature or ventilation systems and you can save energy and optimize usage by providing availability of rooms in real-time without the hassle of having each room checked by a person.
An interesting fact is that lighting use constitutes about 20% of the total energy consumption in commercial buildings. Heating or cooling, depending on the season, can also be automated based on usage and human presence.
There are quite a few sensor-based solutions to detect human presence in a room and while the most simple, a video camera, would come to mind they are probably the least used in the actual real environment due to their extra privacy issues (avoiding recording video is a must) and added complexity. Usually, the sensors used in this application are infrared, ultrasonic, microwave, or other technology to decide if people are present in a room.
Another challenge in managing a commercial building is scheduling rooms based on availability. People are already accustomed to Calendly and other similar tools to set up availability for one's preferred time to meet but adding a real floorplan in the mix could save the trouble of mailing back and forth to confirm a location.
SiLabs have launched the new EFR32MG24 Wireless SoCs and they are full of interesting sensors and features making them a very good one-stop-shop for an all-around development board for mesh IoT wireless connectivity using Matter, OpenThread, and Zigbee protocols for smart home, lighting, and building automation products or any other use case you see fit to this combination of sensors and connectivity.
The sensors present on board are an accelerometer, a microphone, environmental sensors comprising temperature, humidity, and air pressure, a Hall sensor, an inertial and an interactional sensor. So we have quite an array of possibilities to choose from.
Our board has a EFR32MG24B310F1536IM48 indicator meaning its part of the Mighty Gecko 24 family of ICs by SilLabs, has an IADC High-Speed / High-Accuracy and Matrix Vector Processor (MVP) Available and 10 dBm PA Transmit Power, 1536 kb of memory, can function between -40 and + 125 Celsius degrees and has 48 pins.
With key features like high performance 2.4 GHz RF, low current consumption, an AI/ML hardware accelerator, and Secure Vault, IoT device makers can create smart, robust, and energy-efficient products that are secure from remote and local cyber-attacks. An ARM Cortex®-M33 running up to 78 MHz and up to 1.5 MB of Flash and 256 kB of RAM provides resources for demanding applications while leaving room for future growth. Target applications include gateways and hubs, sensors, switches, door locks, LED bulbs, luminaires, location services, predictive maintenance, glass break detection, wake-word detection, and more.
For this application, we have decided to use the microphones with which the xG24 DevKit comes equipped. To be more precise, we will be capturing sound from the room in 1-second windows, run it through a signal processing block and decide, by using a TinyML model, whether the room is occupied or not. We want to capture the Sound of Silence :) if we may.
A very important mention concerning privacy is that we will use the microphones only as a source for sound level, not recording any voices or conversations after the model is deployed.
EFR32MG24 Dev kit (USB cable included)
A CR2030 Battery
A 3D printed enclosure (optional)
Simplicity Commander - a utility that provides command line and GUI access to the debug features of EFM32 devices. It enables us to flash the firmware on the device.
The Edge Impulse CLI - A suite of tools that will enable you to control the xG24 Kit without being connected to the internet and ultimately, collect raw data and trigger in-system inferences
The base firmware image provided by Edge Impulse - enables you to connect your SiLabs kit to your project and do data acquisition straight from the online platform.
Since all sensors are present on the development board there is not that much to do on the hardware side, you will use the USB cable to program the board, and afterward, to test it you can use a CR2030 battery to supply its power. Mileage will vary based on the use case and how often you read the sensors and you send data to the cloud.
Since it will be mounted in a room where you want to detect the presence of persons we decided to create a 3D enclosure so it protects the development board and keeps it nice and tidy. While the whole action takes place indoors, there are still some accidents that happen on a conference table, like liquid spillage, that might damage the board. In this case, the 3D printed case offers an extra level of protection by elevating the board above the table-top level.
First of all install both Simplicity Commander and the Edge Impulse CLI depending on your OS, by following the official documentation.
Use a micro-USB cable to connect the development board to your PC and launch Simplicity Commander. You will be met with a screen containing various information regarding your development board like Chip Type, Flash Size, and more.
Make sure you have the Edge Impulse firmware downloaded and head over to the Flash panel of Simplicity Commander.
Download the base firmware image provided by Edge Impulse for this board and select the connected Kit in the dropdown menu on the top-left corner of the window, then hit Browse, select the Firmware image and click Flash to load the firmware on the DevKit.
With the custom firmware in place, we have everything we need to start creating our TinyML model.
First up, let’s create an Edge Impulse project. Log in to your free account, click on Create new project, give it a recognizable name and click on Create New Project.
Navigate to the Dashboard tab, and then to the Keys page. Here, you will find the API key of your project that we will employ to connect the xG24 Devkit to our project. If the API key appears shortened, try to zoom out a bit so you are able to completely copy it.
Connect the xG24 Kit to the computer, launch a terminal and run:
edge-impulse-daemon --api-key <my project api key>
In the future, if you wish to change the project that your development boards connect, run the same command with a different api-key:
Now, if you navigate to the Devices tab, you will see your device listed, with a green dot signaling that it is online.
Once the device is properly attributed to the Edge Impulse project, it’s time to navigate to the Data Acquisition tab.
On the right side of the screen, you will notice the Record new data panel. Leave the settings to the default ones, fill in the Label field with a recognisable name and start sampling. Keeping in mind the fact that Neural Network feeds data, record at least 3 minutes of data for each defined class.
In the testing phase of the model we are building, we will need some samples in the Test data as well, so do keep in mind to record some. An ideal Train/ Test split would be 85% - 15%.
With the data in place, let’s start building our Impulse. You could look at an Impulse like the functional Block of the Edge Impulse ecosystem, and it represents an ensemble of blocks through which data flows.
An impulse is made out of 4 levels: The input level, the signal processing level, the Learning level, and the output level.
At the Input level of an impulse, you can define the window size, or to put it simply, the size of data you wish to perform signal processing and classification on. Make sure the Frequency matches the recording frequency used in the Data Acquisition phase and that Zero-pad data is checked.
The Signal processing level is made out of one or more processing blocks that enable you to extract meaningful features from your data. Due to the fact that the model we are training is supposed to run on the edge, we must identify the most relevant features and use them in the training process. There are many processing blocks available that allow you to extract frequency and power characteristics of a signal, extract spectrograms from audio signals using Mel-filterbank energy features or flatten an axis into a single value and more, depending on your specific use case. If needs be, Edge Impulse also allows its users to create their own custom processing blocks.
The Learning level is where the magic happens. This is the point where the model training takes place. Edge impulse provides various predefined learning blocks like Classification (Keras), Anomaly Detection (K-Means), Object Detection (FOMO) and many others.
In the Output level, you can see the 2 features your Impulse will return after running the data through the previous levels.
To wrap it up, for our use case we have decided to go with a 1 second window, Audio (MFE) as our processing block, and a Classification (Keras) Neural Network. With everything in place, click on Save Impulse and move over to the MFE tab that just appeared under the Impulse Design menu.
This block uses a non-linear scale in the frequency domain, named Mel scale. The Mel scale is a logarithmic scale used to represent frequency, such as the decibel or Hertz scale. What makes it unique is that the Mel scale is based on human perception of frequency. This makes it a useful tool for representing signals in the frequency domain, as it corresponds more closely to how humans perceive sound. Being logarithmic, the Mel scale compresses the range of frequencies that it covers and this can be helpful when working with signals that contain a large range of frequencies, making patterns more easily visible.
At this point, tweak the parameters with a simple principle in sight: similar results for similar data. In our case, we have reduced the filter number from 40 to 20 for best results. Once you are happy with the DSP results, click on Save parameters and you will be directed to the Generate Features tab.
After you click on the Generate Features button, the Feature explorer will be presented to you. Here you can explore your data in a visual way and quickly validate if your data separates nicely. If you are not happy with the results, navigate back to the Parameters page and modify them some more.
What you are aiming to see in the Feature explorer are clearly defined clusters, with the lowest number possible of misclassified data points.
In the NN Classifier tab, under the Impulse Design menu, allows us to configure various parameters that influence the training process of the neural network. For the moment, it suffices to leave the Training setting on the default value. You can notice in the Audio training options menu that a Data Augmentation option may be checked. Fundamentally, what Data Augmentation does is artificially increase the amount of training data, to improve the classifier’s accuracy, avoid overfitting and reduce the number of training cycles required. Check it, leave the settings as they come and click on Start Training.
Once the training is done, you will be presented with the training output. What we are striving to achieve is an Accuracy of over 95%. The Confusion matrix right underneath displays in a tabular form the correct and incorrect responses given by our model that was fed the data set previously acquired. In our case, you can see that if a room is crowded there is a 4.76% chance that it will be classified as an empty room.
You can also visually see this in the Feature explorer, the misclassified CrowdedRoom points(represented with red dots) being placed near the EmptyRoom cluster.
The best way to test out our model is to navigate to the Live Classification tab and start gathering some new samples. Make sure the sampling Frequency is the same as the one used in the Data Acquisition phase and click on Start Sampling.
This is a great way to validate your model with data that was captured with the same device you intend to deploy it on.
In this last step, we will be taking the trained and optimized model and deploying it back on the device used for data acquisition. What we will achieve by this is decreased latency and power consumption, while also being able to perform the inference without an internet connection:
The SiLabs xG24 Dev Kit is fully supported by Edge Impulse. What this means is that, if you navigate to the Deployment tab, you will notice that in the “Build Firmware” section you can select the board and click Build.
What this does is build the binary that we will upload on the development board in the same way we uploaded the base firmware at the beginning of the tutorial.
Connect the board to your computer, launch Simplicity Commander, select the board, navigate to the flash menu, carefully select the binary file and press Flash.
Restart the board, launch a Terminal and run:
edge-impulse-run-impulse
If everything went smoothly, you should see something like this, confirming the fact that you have deployed the model correctly and that the inference is running smoothly.
Edge Impulse offers its users the possibility to export the model as a C++ library that contains all the signal processing blocks, learning blocks, configurations, and SDK needed to integrate the model in your own custom application. Moreover, in the case of the xG24 devkit, it also provides the Simplicity Studio Component file.
By understanding occupancy patterns, building managers can make informed decisions that will improve the comfort, safety, and efficiency of their buildings.
The xG24 DevKit is quite a powerhouse with the number of sensors present on it and many other use cases are possible. The recipe presented above can be used to quickly adapt to other environmental metrics you want to keep an eye on by training models on Edge Impulse.
If you need assistance in deploying your own solutions or more information about the tutorial above please reach out to us!
A smart device that detects running faucets using a machine learning model, and sends alert messages over a cellular network.
Created By: Naveen Kumar
Public Project Link: https://studio.edgeimpulse.com/public/119084/latest
GitHub Repository: https://github.com/metanav/running_faucet_detection_blues_wireless
Poor memory is only one of the many unpleasant experiences that accompany old age and these problems can have far-reaching implications on the comfort and security of seniors. Dementia is one of the most common neurological problems associated with the elderly. Imagine a case of seniors leaving the faucet on. The kind of water damage that might ensue is simply unimaginable. Not to mention lots of safety concerns such as electrocution and drowning. Also, sometimes kids or even adults forget to stop the faucet after use. It also adds up to your monthly water usage bills. According to the US EPA, leaving a faucet on for just five minutes wastes ten gallons of water. In this project, I have built a proof-of-concept of an AIoT (Artificial intelligence of things) device that can detect running faucets using a microphone and send an alert notification message.
This project requires a low-powered, reliable, and widely available yet cost-effective cellular network radio to send alert messages to the phone and cloud. I will be using a Blues Wireless Notecard (for Cellular connectivity) and a Blues Wireless Notecarrier-B, a carrier board for the Notecard.
Although the Notecard is capable as a standalone device for tracking purposes, we need to run Tensorflow Lite model inferencing using Edge Impulse, so we will be using a Seeed XIAO nRF52840 Sense as a host MCU. The slim profile of the Notecard with carrier board and inbuilt microphone on the tiny Seeed XIAO nRF52840 Sense makes it a good fit for our purpose. We need an antenna for better indoor cellular connectivity and a protoboard to assemble the hardware.
The Notecarrier-B and Seeed XIAO nRF52840 Sense are connected over I2C.
The schematics are given below.
We will use Edge Impulse Studio to train and build a TensorFlow Lite model. We need to create an account and create a new project at https://studio.edgeimpulse.com. We are using a prebuilt dataset for detecting whether a faucet is running based on audio. It contains 15 minutes of data sampled from a microphone at 16KHz over the following two classes:
Faucet - faucet is running, with a variety of background activities.
Noise - just background activities.
We can import this dataset to the Edge Impulse Studio project using the Edge Impulse CLI Uploader. Please follow the instructions here to install Edge Impulse CLI: https://docs.edgeimpulse.com/docs/edge-impulse-cli/cli-installation. The datasets can be downloaded from here: https://cdn.edgeimpulse.com/datasets/faucet.zip.
You will be prompted for your username, password, and the project where you want to add the dataset.
After uploading is finished we can see the data on the Data Acquisition page.
In the Impulse Design > Create Impulse page, we can add a processing block and learning block. We have chosen MFE for the processing block which extracts a spectrogram from audio signals using Mel-filterbank energy features, great for non-voice audio, and for the learning block, we have chosen Neural Network (Keras) which learns patterns from data and can apply these to new data for recognizing audio.
Now we need to generate features in the Impulse Design > MFE page. We can go with the default parameters.
After clicking on the Save Parameters button the page will redirect to the Generate Features page where we can start generating features which would take a few minutes. After feature generation, we can see the output in the Feature Explorer.
Now we can go to the Impulse Design > NN Classifier page where we can define the Neural Network architecture. We are using a 1-D convolutional network which is suitable for audio classification.
After finalizing the architecture, we can start training which will take a couple of minutes to finish. We can see the accuracy and confusion matrix below.
For such a small dataset 99.2% accuracy is pretty good so we will use this model.
We can test the model on the test datasets by going to the Model testing page and clicking on the Classify all button. The model has 91.24% accuracy on the test datasets, so we are confident that the model should work in a real environment.
The Edge Impulse Studio and Blues Wireless Notecard both support Arduino libraries, so we will choose the Create Library > Arduino library option on the Deployment page. For the Select optimizations option, we will choose Enable EON Compiler, which reduces the memory usage of the model. Also, we will opt for the Quantized (Int8) model. Now click the Build button, and in a few seconds, the library bundle will be downloaded to your local computer.
Before starting to run the application we should set up the Notecard. Please see the easy-to-follow quick-start guide here to set up a Notecard with a Notecarrier-B to test that everything works as expected. The application code does the Notecard setup at boot-up to make sure it is always in the known state. We also need to set up Notehub, which is a cloud service that receives data from the Notecard and allows us to manage the device, and route that data to our cloud apps and services. We can create a free account at https://notehub.io/sign-up, and after successful login, we can create a new project.
We should copy the ProjectUID which is used by Notehub to associate the Notecard to the project created.
For SMS alerts we need to set up an account at Twilio and create a route by clicking the Create Route link at the top right on the Route page. Please follow the instructions given in the nicely written guide provided by Blues Wireless, for leveraging the General HTTP/HTTPS Request/Response Route type to invoke the Twilio API.
In the Filters section, we have to specify which Notecard outbound file data we want to route to Twilio. It would make sure that we always send out the intended data. In the application code, we would add notes to the twilio.qo file.
To send SMS messages, the Twilio API expects form data with three key/value pairs (Body, From, and To). This can be achieved using a JSONata (a query and transformation language for JSON data) expression to format the data into the required form. We should choose JSONata Expression in the Data > Transform field and we can enter the JSONata expression in the text area as shown below.
The JSONata expression is given below that formats the JSON payload to a Twilio API consumable message format.
Please follow the instructions here to download and install Arduino IDE. After installation, open the Arduino IDE and install the board package for the Seeed XIAO nRF52840 Sense by going to Tools > Board > Boards Manager. Search the board package as shown below and install it.
After the board package installation is completed, choose the Seeed XIAO BLE Sense from Tools > Board > Seeed nRF52 OS mbed-enabled Boards menu and select the serial port of the connected board from Tools > Port menu. We need to install the Blues Wireless Notecard library using the Library Manager (Tool > Manage Libraries...) as shown below.
Below is the Arduino sketch for inferencing. For continuous audio event detection, the application uses two threads, one for inferencing and another for audio data sampling so that no events should miss.
To run the inferencing sketch, clone the application GitHub repository using the command below.
Import the library bundle running_faucet_blues_wireless_inferencing*.zip* using the menu Sketch > Include Library > Add.ZIP Library in the Arduino IDE. Open the inferencing sketch notecard_nano_ble_sense_running_faucet_detection.ino and compile/upload the firmware to the connected Seeed XIAO nRF52840 Sense board. We can monitor the inferencing output and Notecard debug logs using the Tools > Serial Monitor with a baud rate of 115200 bps.
For protection, the device is placed inside a plastic box that can be mounted on a wall.
The flexible cellular antenna is stuck to the side of the box.
Although this proof-of-concept device is used in the house with a wall outlet, it can be powered using batteries. Being equipped with cellular connectivity, it can be installed in those areas where there is no WiFi network. This is an easy-to-use and convenient device that respects users' privacy by running the inferencing at the edge and sending alert notifications on time.
Use a Particle Photon 2 to turn a device on or off by listening for a keyword or audio event, and opening or closing a Relay accordingly.
Created By: Roni Bandini
Public Project Link: https://studio.edgeimpulse.com/public/288386/latest
GitHub Repository: https://github.com/ronibandini/Photon2VoiceCommand
In industrial settings, workers often find themselves unable to operate machinery manually due to concurrent tasks demanding the use of their hands. Machine Learning (ML) has emerged as a transformative solution, enabling compact devices to comprehend and respond to vocal instructions, thereby initiating or halting machines as needed.
In the context of this project, we will leverage the Edge Impulse platform to train a customized ML model. Subsequently, we will deploy this model onto a Particle Photon 2 microcontroller board. The Photon 2 board will be connected to a PDA microphone and a Relay module, creating an integrated system for practical demonstration.
The Photon 2 is an interesting, high quality board made by Particle. It has 5 GHz WiFi and Bluetooth (BLE) 5, an ARM Cortex-M33 CPU running at 200 MHz, 2 MB of storage for user applications, 3 MB of RAM available to user applications, and a 2 MB flash file system.
The board size is 1.5 x 0.78 inches (5 x 2cm) and it comes with pre-soldered, labeled male headers. It also has 2 buttons (Reset and Mode), one RGB led, a LIPO charger with a JST-PH port, and a 10-pin micro JTAG connector for SWD (Serial Wire Debug).
Besides the Photon 2 board, the Edge AI Kit is required for this project, which includes a W18340-A PDA MEMs microphone by Adafruit. The Edge AI Kit also includes jumper wires, a protoboard, PIR sensor, distance sensor, LED, switches, resistors, vibration sensor, accelerometer, and loudness sensor for many other AI projects. The Relay module is not included, but it is a cheap common device that can be ordered online from many electronic stores.
The PDA MIC comes without header pins soldered, so we need to solder 5 headers and discard the sixth one (if using a 6-pin header as I've done here). After that we will connect the PDA mic to the board using jumper cables.
The PDA microphone connections should be:
PDA GND to Photon 2 GND
PDA 3v to Photon 2 3v3
PDA CLK to Photon 2 A0
PDA DAT to Photon 2 A1
The PDA Mic SEL pin is not used for this scenario.
The Relay module should be connected as follows:
Relay GND to Photon 2 GND
Relay VCC to Photon 2 VCC
Relay Signal to Photon 2 D3
Since the Photon 2 has one GND and one 3v3 pin on the board, the Microphone GND and Relay GND should be connected together, and also Microphone 3v and Relay 3v. You can do that using the protoboard and Male-Female jumper cables from the Edge AI kit, or you can also cut and solder the cables.
We will create an account at Edge Impulse, then login and reate a new project.
We will use the computer to record data samples for 2 labels: machine and background. Go to Collect data, Connect a Device, and Connect to your computer.
Enable your microphone permissions if necessary, and repeat the word "machine" several times, leaving a few seconds in between each. For background sound, just record any continuous background sound.
Wait a few seconds to allow the recording to be uploaded to Edge Impulse before going to the next step.
Now we are going to Split the samples. Click on the three vertical dots next to the recording and select "Split sample". Leave the length set to the default, 1000ms. Repeat this process for both labels.
Now we will design an Impulse. In the Edge Impulse Studio, go to Create impulse, set the Window size to 1000ms, the Window increase to 500ms, and add the 'Audio MFCC' Processing Block, which is perfect for voices. Then add 'Classification (Keras)' as the Learning Block. Now click Save impulse.
Next we will go to MFCC parameters. This page allows us to configure the MFCC block, and lets us preview how the data will be transformed. The MFCC block transforms a window of audio into a table of data where each row represents a range of frequencies, and each column represents a span of time. The value contained within each cell reflects the amplitude of its associated range of frequencies during that span of time.
We will leave the default values, which are pre-configured according to the data. Then we click on Generate Features.
Now we go to Neural Network configuration. Click on Classifier, and the settings can be left on their default values, then scroll down to click Start Training. In this case, we have got a 100% accuracy, which is uncommon, but the data and recordings are quite different so the neural network is able to segment them quite well.
You can upload new samples and test them with Model Testing feature.
The final step is to deploy a library to the Particle Photon 2. We will click Deployment, begin to type the word Particle in the search box, then select the Particle Library and click Build.
The main difference about working with the Photon 2 is that the Arduino IDE is not used to upload the code and libraries. Instead, Microsoft Visual Studio Code is used, and there are several setup steps to carefully follow.
Install Microsoft Visual Studio Code, then add the Particle Workbench extension.
Unzip the Particle library exported from Edge Impulse Studio.
In VS Code, press Ctrl+Shift+P to bring up the Particle menu and select the "Particle: Import Project" feature to choose the library. The first time, VSCode will download dependencies.
In Visual Studio, at the lower right there will be a button to open the Properties file. Navigate to the unzipped folder, select project.properties
and select "Yes, I trust confirmation".
The src/main.cpp
code included in the zip file detects the trained keyword, and prints predictions over serial console. So we need to add some code to control the Relay.
We will open src/main.cpp
and add the following:
and
Then, inside the loop we will add this code to control the Relay. In this case, the label contains the keyword "muted" instead of "machine", as I was exploring audio output. If you are going to use another label, just change the word "muted" to the keyword contained in your own label.
Note: you can also download the
main.cpp
file from https://github.com/ronibandini/Photon2VoiceCommand to make sure you have everything entered correctly.
Now press Ctrl+Shift+P to bring up the Particle menu again, and this time select "Particle: Configure Project for Device". Select deviceOS@5.5.0, board P2, and ESC for device name.
Note: if you get a Microphone library not found, press Ctrl+Shift+P, install the 'Microphone_PDM@0.0.2' library.
Finally, bring up the Particle menu again with Ctrl+Shift+P and select "Particle: Flash application and device, local" (For Windows, it will take around 5 minutes to flash).
If you obtain an "Argument list too long" error during flashing, Particle Support recommends using Docker instead to build:
Another option is to use Mac or Linux. For the demo below, the main.cpp
modifications were made following the work done by Particle for this example https://docs.particle.io/getting-started/machine-learning/youre-muted
Install a lightweight Ubuntu, such as Lubuntu
Install VSCode
Open a Terminal window and execute: sudo apt-get install libarchive-zip-perl
(this step is to avoid an error where crc32 tool is not found)
Click Ctrl+P: Extension Install and locate and install particle.particle-vscode-pack
and press Enter.
Login with Particle credentials
Create a Project
Import the unzipped folder. Accept "Trust all"
Now click Ctrl+Shift+P, choose "Particle: Configure Project for Device", and choose deviceOS@5.3.2, board P2
Bring up the Particle menu again with Ctrl+Shift+P and choose "Particle: Flash Application and Device, local"
Machine Learning not only facilitates the recognition of voice commands but also empowers the identification of distinct machine-generated sounds, enabling automatic shutdown in response to specific malfunction indicators. This functionality proves invaluable in enhancing operational safety and efficiency within industrial environments.
Moreover, the compact form factor and cost-effectiveness of boards like the Particle Photon 2, coupled with their ability to manage external devices, render them an enticing augmentation for various industries. They offer a gateway to harnessing the potential of ML-powered automation within diverse manufacturing settings.
https://www.instagram.com/ronibandini
https://twitter.com/RoniBandini