1 of 29

Audio Projects

Occupancy Sensing - SiLabs xG24

Using captured audio and the SiLabs xG24 to determine if a room is occupied or empty.

Created By:

Public Project Link:

Intro

Occupancy is an important issue in Building Management Systems because based on sensory data you can automatically control lights or temperature or ventilation systems and you can save energy and optimize usage by providing availability of rooms in real-time without the hassle of having each room checked by a person.

An interesting fact is that lighting use constitutes about 20% of the total energy consumption in commercial buildings. Heating or cooling, depending on the season, can also be automated based on usage and human presence.

There are quite a few sensor-based solutions to detect human presence in a room and while the most simple, a video camera, would come to mind they are probably the least used in the actual real environment due to their extra privacy issues (avoiding recording video is a must) and added complexity. Usually, the sensors used in this application are infrared, ultrasonic, microwave, or other technology to decide if people are present in a room.

Another challenge in managing a commercial building is scheduling rooms based on availability. People are already accustomed to Calendly and other similar tools to set up availability for one's preferred time to meet but adding a real floorplan in the mix could save the trouble of mailing back and forth to confirm a location.

Our Solution

SiLabs have launched the new EFR32MG24 Wireless SoCs and they are full of interesting sensors and features making them a very good one-stop-shop for an all-around development board for mesh IoT wireless connectivity using Matter, OpenThread, and Zigbee protocols for smart home, lighting, and building automation products or any other use case you see fit to this combination of sensors and connectivity.

The sensors present on board are an accelerometer, a microphone, environmental sensors comprising temperature, humidity, and air pressure, a Hall sensor, an inertial and an interactional sensor. So we have quite an array of possibilities to choose from.

Our board has a EFR32MG24B310F1536IM48 indicator meaning its part of the Mighty Gecko 24 family of ICs by SilLabs, has an IADC High-Speed / High-Accuracy and Matrix Vector Processor (MVP) Available and 10 dBm PA Transmit Power, 1536 kb of memory, can function between -40 and + 125 Celsius degrees and has 48 pins.

With key features like high performance 2.4 GHz RF, low current consumption, an AI/ML hardware accelerator, and Secure Vault, IoT device makers can create smart, robust, and energy-efficient products that are secure from remote and local cyber-attacks. An ARM Cortex®-M33 running up to 78 MHz and up to 1.5 MB of Flash and 256 kB of RAM provides resources for demanding applications while leaving room for future growth. Target applications include gateways and hubs, sensors, switches, door locks, LED bulbs, luminaires, location services, predictive maintenance, glass break detection, wake-word detection, and more.

A very important mention concerning privacy is that we will use the microphones only as a source for sound level, not recording any voices or conversations after the model is deployed.

Hardware requirements

A CR2030 Battery
A 3D printed enclosure (optional)

Software requirements

Hardware Setup

Since all sensors are present on the development board there is not that much to do on the hardware side, you will use the USB cable to program the board, and afterward, to test it you can use a CR2030 battery to supply its power. Mileage will vary based on the use case and how often you read the sensors and you send data to the cloud.

Since it will be mounted in a room where you want to detect the presence of persons we decided to create a 3D enclosure so it protects the development board and keeps it nice and tidy. While the whole action takes place indoors, there are still some accidents that happen on a conference table, like liquid spillage, that might damage the board. In this case, the 3D printed case offers an extra level of protection by elevating the board above the table-top level.

Software Setup

Updating the firmware

First of all install both Simplicity Commander and the Edge Impulse CLI depending on your OS, by following the official documentation.
Use a micro-USB cable to connect the development board to your PC and launch Simplicity Commander. You will be met with a screen containing various information regarding your development board like Chip Type, Flash Size, and more.

Make sure you have the Edge Impulse firmware downloaded and head over to the Flash panel of Simplicity Commander.

Download the base firmware image provided by Edge Impulse for this board and select the connected Kit in the dropdown menu on the top-left corner of the window, then hit Browse, select the Firmware image and click Flash to load the firmware on the DevKit.

With the custom firmware in place, we have everything we need to start creating our TinyML model.

Creating an Edge Impulse Project

First up, let’s create an Edge Impulse project. Log in to your free account, click on Create new project, give it a recognizable name and click on Create New Project.

Navigate to the Dashboard tab, and then to the Keys page. Here, you will find the API key of your project that we will employ to connect the xG24 Devkit to our project. If the API key appears shortened, try to zoom out a bit so you are able to completely copy it.

Connect the xG24 Kit to the computer, launch a terminal and run:

edge-impulse-daemon --api-key <my project api key>

In the future, if you wish to change the project that your development boards connect, run the same command with a different api-key:

Edge Impulse serial daemon v1.14.10
Endpoints:
    Websocket: wss://remote-mgmt.edgeimpulse.com
    API:       https://studio.edgeimpulse.com/v1
    Ingestion: https://ingestion.edgeimpulse.com

[SER] Connecting to /dev/ttyACM0
[SER] Serial is connected, trying to read config...
[SER] Clearing configuration
[SER] Clearing configuration OK
[SER] Retrieved configuration
[SER] Device is running AT command version 1.7.0

Setting upload host in device... OK
Configuring remote management settings... OK
Configuring API key in device... OK
Configuring HMAC key in device... OK
[SER] Device is not connected to remote management API, will use daemon
[WS ] Connecting to wss://remote-mgmt.edgeimpulse.com
[WS ] Connected to wss://remote-mgmt.edgeimpulse.com
[WS ] Device "SilabsxG24 " is now connected to project "SiLabs EFR32MG24 - Occupancy Sensing"
[WS ] Go to https://studio.edgeimpulse.com/studio/101280/acquisition/training to build your machine learning model!

Now, if you navigate to the Devices tab, you will see your device listed, with a green dot signaling that it is online.

Acquiring training data

Once the device is properly attributed to the Edge Impulse project, it’s time to navigate to the Data Acquisition tab.

On the right side of the screen, you will notice the Record new data panel. Leave the settings to the default ones, fill in the Label field with a recognisable name and start sampling. Keeping in mind the fact that Neural Network feeds data, record at least 3 minutes of data for each defined class.

In the testing phase of the model we are building, we will need some samples in the Test data as well, so do keep in mind to record some. An ideal Train/ Test split would be 85% - 15%.

Designing an impulse

With the data in place, let’s start building our Impulse. You could look at an Impulse like the functional Block of the Edge Impulse ecosystem, and it represents an ensemble of blocks through which data flows.

An impulse is made out of 4 levels: The input level, the signal processing level, the Learning level, and the output level.

At the Input level of an impulse, you can define the window size, or to put it simply, the size of data you wish to perform signal processing and classification on. Make sure the Frequency matches the recording frequency used in the Data Acquisition phase and that Zero-pad data is checked.

The Learning level is where the magic happens. This is the point where the model training takes place. Edge impulse provides various predefined learning blocks like Classification (Keras), Anomaly Detection (K-Means), Object Detection (FOMO) and many others.

In the Output level, you can see the 2 features your Impulse will return after running the data through the previous levels.

To wrap it up, for our use case we have decided to go with a 1 second window, Audio (MFE) as our processing block, and a Classification (Keras) Neural Network. With everything in place, click on Save Impulse and move over to the MFE tab that just appeared under the Impulse Design menu.

This block uses a non-linear scale in the frequency domain, named Mel scale. The Mel scale is a logarithmic scale used to represent frequency, such as the decibel or Hertz scale. What makes it unique is that the Mel scale is based on human perception of frequency. This makes it a useful tool for representing signals in the frequency domain, as it corresponds more closely to how humans perceive sound. Being logarithmic, the Mel scale compresses the range of frequencies that it covers and this can be helpful when working with signals that contain a large range of frequencies, making patterns more easily visible.

At this point, tweak the parameters with a simple principle in sight: similar results for similar data. In our case, we have reduced the filter number from 40 to 20 for best results. Once you are happy with the DSP results, click on Save parameters and you will be directed to the Generate Features tab.

After you click on the Generate Features button, the Feature explorer will be presented to you. Here you can explore your data in a visual way and quickly validate if your data separates nicely. If you are not happy with the results, navigate back to the Parameters page and modify them some more.

What you are aiming to see in the Feature explorer are clearly defined clusters, with the lowest number possible of misclassified data points.

Configure the NN Classifier

In the NN Classifier tab, under the Impulse Design menu, allows us to configure various parameters that influence the training process of the neural network. For the moment, it suffices to leave the Training setting on the default value. You can notice in the Audio training options menu that a Data Augmentation option may be checked. Fundamentally, what Data Augmentation does is artificially increase the amount of training data, to improve the classifier’s accuracy, avoid overfitting and reduce the number of training cycles required. Check it, leave the settings as they come and click on Start Training.

Once the training is done, you will be presented with the training output. What we are striving to achieve is an Accuracy of over 95%. The Confusion matrix right underneath displays in a tabular form the correct and incorrect responses given by our model that was fed the data set previously acquired. In our case, you can see that if a room is crowded there is a 4.76% chance that it will be classified as an empty room.

You can also visually see this in the Feature explorer, the misclassified CrowdedRoom points(represented with red dots) being placed near the EmptyRoom cluster.

Test out the model

The best way to test out our model is to navigate to the Live Classification tab and start gathering some new samples. Make sure the sampling Frequency is the same as the one used in the Data Acquisition phase and click on Start Sampling.

This is a great way to validate your model with data that was captured with the same device you intend to deploy it on.

Deploying the model on the edge

In this last step, we will be taking the trained and optimized model and deploying it back on the device used for data acquisition. What we will achieve by this is decreased latency and power consumption, while also being able to perform the inference without an internet connection:

Option 1: Deploying a pre-built binary

The SiLabs xG24 Dev Kit is fully supported by Edge Impulse. What this means is that, if you navigate to the Deployment tab, you will notice that in the “Build Firmware” section you can select the board and click Build.

What this does is build the binary that we will upload on the development board in the same way we uploaded the base firmware at the beginning of the tutorial.

Connect the board to your computer, launch Simplicity Commander, select the board, navigate to the flash menu, carefully select the binary file and press Flash.

Restart the board, launch a Terminal and run:

edge-impulse-run-impulse

If everything went smoothly, you should see something like this, confirming the fact that you have deployed the model correctly and that the inference is running smoothly.

Edge Impulse impulse runner v1.14.10
[SER] Connecting to /dev/ttyACM0
[SER] Serial is connected, trying to read config...
[SER] Retrieved configuration
[SER] Device is running AT command version 1.7.0
[SER] Started inferencing, press CTRL+C to stop...
LSE
Inferencing settings:
	Interval: 0.062500ms.	Frame size: 16000
	Sample length: 1000.000000 ms.	No. of classes: 2
Starting inferencing, press 'b' to break
Starting inferencing in 2 seconds...
Predictions (DSP: 188 ms., Classification: 4 ms., Anomaly: 0 ms.): 
    CrowdedRoom: 	0.996094
    EmptyRoom: 	0.002031

Option 2: Exporting the Impulse as a C++ library with SLCC (Simplicity Studio Component file) and building the binary locally

Edge Impulse offers its users the possibility to export the model as a C++ library that contains all the signal processing blocks, learning blocks, configurations, and SDK needed to integrate the model in your own custom application. Moreover, in the case of the xG24 devkit, it also provides the Simplicity Studio Component file.

Conclusion

By understanding occupancy patterns, building managers can make informed decisions that will improve the comfort, safety, and efficiency of their buildings.

The xG24 DevKit is quite a powerhouse with the number of sensors present on it and many other use cases are possible. The recipe presented above can be used to quickly adapt to other environmental metrics you want to keep an eye on by training models on Edge Impulse.

Smart Appliance Control Using Voice Commands - Nordic Thingy:53

Using a Nordic Thingy:53 with Keyword Spotting to turn an ordinary device into a smart appliance.

Created By: Zalmotek

Public Project Link:

https://studio.edgeimpulse.com/studio/145818

GitHub Repository:

https://github.com/Zalmotek/edge-impulse-appliance-control-voice-nordic-thingy53

Introduction

In today's world, voice commands are becoming a popular user input method for various devices, including smart home appliances. While classical UI methods like physical buttons or a remote control will not be soon displaced, the convenience of using voice commands to control an appliance when multitasking or when having your hands busy with something else, like cooking, cannot be denied.

The Challenge

While very convenient in day to day use, using human speech as user input comes with a number of challenges that must be addressed.

First and foremost, using human speech as user input for smart appliances requires the recognition and understanding of natural language. This means that there must be some sort of keyword detection or voice recognition technology involved. As each user may have their own unique way of phrasing a request, like “ turn on the fan” or “start a 5 minute timer”, the voice recognition algorithms must be fine tuned to obtain the best accuracy.

Another big challenge when implementing such technologies is the security of the gathered data. Privacy concerns regarding speech recognition require cutting-edge encryption techniques to protect voice data and ensure the privacy of any sensitive personal or corporate information transmitted. An easy way to circumvent this is by employing IoT devices that run a machine learning algorithm on the edge and which do not store any data while running the detection algorithm.

Our Solution

We will be demonstrating the design and build process of a system dedicated to integrating basic voice control functionality in any device by using Nordic Thingy:53 dedicated hardware and an audio categorization model developed and optimized using the Edge Impulse platform.

The Nordic Thingy:53™ is an IoT prototyping platform that enables users to create prototypes and proofs of concept without the need for custom hardware. The Thingy:53 is built around the nRF5340 SoC, Nordic Semiconductor’s flagship dual-core wireless SoC. Its dual Arm Cortex-M33 processors provide ample processing power and memory size to run embedded machine learning (ML) models directly on the device with no constraints.

To build the machine learning model responsible for speech recognition we will be using the Edge Impulse platform. Between the plethora of advantages of using this platform, it’s worth mentioning that it does not require a lot of data to train a performant AI model and that it provides a great number of processing power and energy consumption optimization tools, allowing users to build models that can run even on resource restricted devices.

For this use case, we will be using the Nordic Thingy:53 as an advertising peripheral Bluetooth device that listens for the user input and then sends a message via BLE to the ESP32 connected to a relay that can switch on or off an appliance. This system architecture enables users to control multiple appliances distributed around the house, and only one “gateway” that runs the machine learning model on the edge.

Hardware requirements

Nordic Thingy:53
Esp32 DevKit
Relay
Android/iOS device
J-link mini Edu
220V to 5V power regulator
Plug-in plastic enclosure

Software requirements

Edge Impulse account
Edge Impulse CLI
Arduino IDE
Arduino CLI
Git
A working Zephyr environment

Hardware Setup

To control appliances you will either have to work with AC mains or integrate with some functionality that the appliance might have, such as IR control or switches.

We chose to connect the Adafruit Non-Latching Mini Relay FeatherWing to the ESP32 development board as presented in the following schematic. The Signal pin of the relay is connected to the GPIO32 pin of the ESP32 and the ground is common between the ESP32 and the relay.

You can use this circuit to control AC powered household devices, such as kettles, lights, or stove smoke extractors.

We have chosen an enclosure that will safely protect users from the AC Mains and the rest of the electronics. We have soldered the circuit on a test board by using the following schematic that we tested first on a breadboard. We have kept the testboard neatly separated between low DC voltage that was routed in the top part and AC High voltage that was routed in the lower part of the testboard.

Warning: Working with AC mains is dangerous if you have never done it before, please ask for an electronics senior or document yourself thoroughly before undergoing this schematic.

The enclosure provides an AC In and an AC Out socket plug that allows us to integrate our electronics between thus keeping the test boards continuously supplied and enabling or disabling the output thus turning on or off the supply.

Use the appropriate wire gauge when working with AC so the current draw of the appliance is met. We have used standard 16A wire gauge with the proper colors (blue for neutral, brown for line and yellow/green for grounding according to EU standards).

The gateway can be placed anywhere in the house and does not need to be connected to a power outlet as it is battery powered, making it much more convenient for users.

Software Setup

Setup the Build Environment

Building and flashing custom applications on the Nordic Thingy:53 board require a working Zephyr environment. To create it, follow the steps in the Getting Started guide from the official Zephyr documentation. Afterwards, follow the steps presented in the Developing with Thingy:53 guide from the official Nordic Semiconductor documentation. While this might not be mentioned in either of the documents, you must also install the J-Link Software and Documentation Pack and the nRF Command Line Tools(ver 10.15.4) to be able to flash the board.

After following the steps in the guides presented above, you should have a working Zephyr environment. Remember to always work in the virtual environment created during the Getting Started guide when developing applications for this platform.

Creating an Edge Impulse Project

Let's start by creating an Edge Impulse project. Select Developer as your project type, click Create a new project, and give it a significant name.

Connecting the Device

Thingy:53 devices will work with the Nordic nRF Edge Impulse iPhone and Android apps or with the Edge Impulse Studio right away.

First of all, the firmware of the Thingy:53 device must be updated. Download the nRF Programmer mobile application and launch it. You will be prompted with a number of available samples.

Select the Edge Impulse application, select the version of the sample from the drop-down menu and tap Download.

Once that is done, tap Install and a list with nearby devices will appear. You have to select your development board from the list and the upload process will begin.

With the firmware updated, connect the Thingy:53 board to a computer that has the edge-impulse-cli suite installed, turn it on, launch a terminal and run:

edge-impulse-daemon --clean

You will be required to provide your username and password before choosing the project to which you want to attach the device.

Edge Impulse serial daemon v1.14.10
? What is your user name or e-mail address (edgeimpulse.com)? <your user>
? What is your password? [hidden]

Once you select the project and the connection is successful, the board will show up in the Devices tab of your project.

Building the Dataset

Considering the context, the best way to gather relevant data for your Thingy:53 is to record yourself. To achieve the best performance of the speech recognition model, you can add samples of your own voice to increase the specificity of the detection algorithm.

Go to Data Acquisition in your Edge Impulse project and you can start gathering the data set.

For this particular use case, we will be using the "Light", "Kettle" and "Extractor" keywords to turn on different items in the kitchen. Now start recording 5-10 seconds segments of you saying "Light", "Kettle" and "Extractor". You will notice that they appear in the Collected data tab. Click on the menu symbolized by three points and press Split Sample. Edge Impulse automatically splits the sample in 1 second windows but you can adjust those manually. When you are happy with the windows, press Split.

We will also require audio that doesn't contain the keywords we wish to detect. We must gather a data set with sounds like ambient noise, people speaking in the distance, or different sound from the kitchen all of which fall within the "background" category. This class is identified as "Background". Keep in mind that data is the most important part of machine learning and your model will perform better the more diverse and abundant your data set is.

From now on, because "Light", "Kettle" and "Extractor" are the classes we wish to detect, we will be referring to them as "positive classes" and to "Background" as a negative class. What we will be doing is use the keywords dataset from the Unknown and Noise samples for background noises.

Download the dataset and navigate to the Upload data menu. We will be uploading all the samples to the Training category and we will label the "Noise" and "Unknown" samples with the label "Background".

Designing the Impulse

Let's begin developing our Impulse now that the data are available. An Impulse describes a group of blocks through which data flows and can be viewed as the functional Block of the Edge Impulse ecosystem.

The input level, the signal processing level, the learning level, and the output level are the four levels that make up an impulse.

For the input block, we will leave the setting as default. As for the processing and learning block, we have opted for an Audio (MFCC) block and a basic Classification (Keras).

Do the set up just like in the image above and click on Save Impulse and move on to configure the blocks one by one.

Configuring the Audio Features Block

The Audio MFCC (Mel Frequency Cepstral Coefficients) block extracts coefficients from an audio signal using Mel-scale, a non-linear scale. This block is used for human voice recognition, but can also perform well for some non-voice audio use cases. You can read more about how this block works here.

You can use the default values for configuring the MFCC block and click on Save parameters. You’ll be prompted to the feature generation page. Click on Generate features and you will be able to visualise them in the Feature explorer.

Configure the Classifier (NN)

The next step in developing our machine learning algorithm is configuring the NN classifier block. Before training the model, you have the opportunity to modify the Number of training cycles, the Learning rate, the Validation set size and to enable the Auto-balance dataset function. The learning rate defines how quickly the NN learns, the number of training cycles determines the number of epochs to train the NN on, and the size of the validation set sets the percentage of samples from the training data pool utilized for validation. You can leave everything as it is for now and press Start training.

The training will be assigned to a cluster and when the process ends, the training performance tab will be displayed. There will be displayed the Accuracy and the Loss of the model, as well as the right and wrong responses provided by the model. You can also see an intuitive representation of the classification and underneath it, the predicted on-device performance of the NN.

Test the Impulse Using iOS / Android

One way of deploying the model on the edge is using the Nordic nRF Edge Impulse app for iPhone or Android:

Download and install the app for your Android/IoS device.
Launch it and log in with your edgeimpulse.com credentials.
Select your Smart Appliance Control Using Voice Commands project from the list

Navigate to the Devices tab and connect to the Thingy:53:

Navigate to the Data tab and press Connect. You will see the status on the button changing from Connect to Disconnect.

Navigate to the Deployment tab and press Deploy.

In the Inferencing tab, you will see the results of the Edge Impulse model you have flashed on the device:

Creating a Custom Application

To showcase the process of creating a custom application, we have decided to create a basic Bluetooth application. In this application, the Thingy:53 functions as a peripheral bluetooth device that advertises itself. The ESP32 functions as a bluetooth client that scans for available devices to connect to. When it detects the Thingy:53, it pairs with it and awaits a command.

After the devices are paired, when the central button of the Thingy:53 is pressed, it sends a message to the Esp32 which triggers a Relay.

Here you can find the source code for the Thingy:53 and the Esp32.

To build and flash the application on the Nordic hardware, copy the folder named Thingy53_Peripheral from the repository, to ncs/nrf folder and then run:

&west build -p always -b thingy53_nrf5340_cpuapp
nrf/Thingy53_Peripheral/

Make sure you have the board powered on and connected via the J-Link mini Edu to your computer and run

&west flash

To flash the esp32, follow the steps provided here to set-up the build environment and then, simply copy the code from the ESP32_Client.ino file in a new sketch and press upload.

Future Development

For this project we have decided to deploy the machine learning algorithm on the Bluetooth peripheral device. This enables the possibility of using multiple Central devices that are dedicated to switching the appliance on and off, while the processing load of running the machine learning algorithm is done by the Thingy:53.

Another great addition to this project would be the implementation of the Matter protocol.

Matter is a royalty free standard and was created to encourage interoperability between different devices and platforms.

If the appliances already have an IoT layer, the Thingy:53 is fully compatible with the Matter and instead of using relays, it could be directly interfaced with the smart appliances.

Conclusion

In this article, we have presented a very basic implementation of Edge Impulse on the Thingy:53 to control an appliance using voice commands. The use of Edge Impulse and the integration with Nordic Semiconductor's IoT platform opens up endless possibilities for creating intelligent and user-friendly appliances.

The ability to quickly gather data, create, train and deploy machine learning algorithms greatly simplifies the process for developers, making it easier for them to incorporate these technologies into their projects.

Ultimately, this system provides a convenient and cost-effective way to control multiple appliances in the home.

We hope that this article will inspire you to try out Edge Impulse and Nordic Semi's Thingy:53 in your own smart appliance projects.

Glass Window Break Detection - Nordic Thingy:53

Build a machine learning model and deploy it to a Nordic Thingy:53 to detect the sound of breaking glass.

Created By:

Public Project Link:

Introduction

Glass/window breaking detection systems are used in office buildings for safety purposes. They can be used to detect when a window is broken and trigger an alarm. These systems can also be used to collect data about the event, such as the time, location, and type of break, thus generating data that can be used to further bolster the safety of office buildings in the future.

There are many different types of glass/window breaking detection systems available on the market but they fall in two broad categories:

Systems that use vibration and audio sensors to detect the sound of breaking glass.
Computer vision based systems used to detect signs of damage in the windows.

The challenge

The biggest challenge with any detection system is to minimize false positives - that is, to avoid triggering an alarm when there is no actual danger. This is especially important in the case of glass/window breaking detection systems, as a false positive can cause significant disruption and even panic.

There are many factors that can cause a false positive with these types of systems, such as:

Background noise: office buildings are typically full of ambient noise (e.g. people talking, computers humming, etc.) which can make it difficult for sensors to accurately identify the sound of breaking glass.
Weather: windy conditions can also create background noise that can interfere with sensor accuracy.
Sound Volume: if the sound of breaking glass is not loud enough, it may not be picked up by sensors.

The Solution

Our approach for those challenges is to create an IoT system based on the Nordic Thingy:53™,development board that will run a machine learning model trained using the Edge Impulse platform that can detect the sound of breaking glass and send a notification via Bluetooth when this event is detected. We have narrowed our hardware selection to the Nordic Thingy:53™ as it integrates multiple sensors (including an accelerometer, gyroscope, microphone, and temperature sensor) onto a single board, which will simplify our data collection process. In addition, the Nordic Thingy:53™ has built-in Bluetooth Low Energy (BLE) connectivity, which will allow us to easily send notifications to nearby smartphones or other devices when our glass/window breaking detection system is triggered. The Nordic Thingy:53 is powered by the nRF5340 SoC, Nordic Semiconductor’s flagship dual-core wireless SoC that combines an Arm® Cortex®-M33 CPU with a state-of-the-art floating point unit (FPU) and Machine Learning (ML) accelerator. This will enable us to run our machine learning model locally on the Thingy:53, without needing to send data to the cloud for processing.

To build our machine learning model, we will be using the Edge Impulse platform. Edge Impulse is a Machine Learning platform that enables you to build custom models that can run on embedded devices, such as the Nordic Thingy:53™. With Edge Impulse, you can collect data from sensors, process this data using various types of Machine Learning algorithms (such as classification or regression), and then deploy your trained model onto your target device.

Edge Impulse has many benefits, the most useful being that you don't need extensive data to train a high-functioning AI model. You can also easily adjust the models based on various needs like processing power or energy consumption.

Hardware Requirements

Android/iOS device

Software Requirements

nRF Programmer Android/IoS App
Edge Impulse account
Git

Hardware Setup

Due to the fact that the Nordic Thingy:53 comes with a high quality MEMS microphone embedded on it, there is no wiring that must be done. Simply connect the development board to a power supply and move over to the next step.

Software Setup

Creating an Edge Impulse Project

Let's start by creating an Edge Impulse project. Select Developer as your project type, click Create a new project, and give it a memorable name.

Connecting the Device

New Thingy:53 devices will function with the Nordic nRF Edge Impulse iPhone and Android apps as well as the Edge Impulse Studio right out of the box.

Before connecting it to the Edge Impulse project, the firmware of the Thingy:53 must be updated. Download the nRF Programmer mobile application for Android or iOS and launch it. You will be prompted with a number of available samples.

Select the Edge Impulse application, select the version of the sample from the drop-down menu and tap Download.

When that is done, tap Install. A list with the nearby devices will appear and you must select your development board from the list. Once that is done, the upload process will begin.

With the firmware updated, connect the Thingy:53 board to a computer that has the edge-impulse-cli suite installed, turn it on, launch a terminal and run:

edge-impulse-daemon --clean

You will be required to provide your username and password before choosing the project to which you want to attach the device.

Edge Impulse serial daemon v1.14.10
? What is your user name or e-mail address (edgeimpulse.com)? <your user>
? What is your password? [hidden]

Once you select the project and the connection is successful, the board will show up in the Devices tab of your project.

Building the Dataset

For this particular use case, recording data containing glass breaking sounds is challenging. For such situations, Edge Impulse offers its users the possibility of uploading publicly available recordings of various phenomena that can be post-processed in the Data Acquisition tab.

We have gathered over 15 minutes of glass shattering sounds from various license-free SFX sound effects websites and we have uploaded them in our training data pool, using GlassBreaking as their Label. This can be done by navigating to the Upload data tab.

We also need audio for this application that doesn't contain the sound events that we want to identify. We must add sounds that belong to the "background sounds" category to the data pool, such as honks, people talking loudly, doors closing and other various background sounds that the system might be exposed to during normal use. The name of this class should be "BACKGROUND." When populating your dataset, keep in mind that the most crucial component of machine learning is data, and the richer and more varied your data set is, the better your model will perform.

Designing the Impulse

Now that the data is available, it’s time to create the Impulse. The functional Block of the Edge Impulse ecosystem is called “Impulse” and it fundamentally describes a collection of blocks through which data flows, starting from the ingestion phase and up to outputting the features.

The setup is rather straightforward for this use case. We will be using a 1000ms window size, with a window increase of 200ms at an acquisition frequency of 100Hz. For the processing block we will be using an Audio (MFE) and for the Learning block, we will be employing a basic Classification (Keras).

Configuring the Spectrogram Block

When navigating to this menu, you will notice that in the top part of the screen you can explore the time domain representation of the data you have gathered.

Underneath, various parameters of the processing block may be modified. For the moment, we will be moving forward with the default values.

And finally, on the right side of the window you can observe the results of the Digital Signal Process and a spectrogram of the raw signal.

A good rule of thumb when tweaking the DSP block parameters is that similar signals should yield similar results.

Once you are happy with the results, click on Save parameters. After the "Generate features" page loads, click on Generate Features.

In the Generate features tab, you can observe the Feature explorer. It enables visual, intuitive data exploration. Before beginning to train the model, you can rapidly verify whether your data separates neatly. If you're looking to identify the outliers in your dataset, this feature is fantastic because it color-codes comparable data and enables you to track it back to the sample it originated from by just clicking on the data item.

Configure the NN Classifier

The next step in developing our machine learning algorithm is configuring the NN classifier block. There are various parameters that can be changed: the Number of training cycles, the Learning rate, the Validation set size and to enable the Auto-balance dataset function. They allow you to control the number of epochs to train the NN on, how fast it learns and the percent of samples from the training dataset used for validation. Underneath, the architecture of the NN is described. For the moment, leave everything as is and press Start training.

The training will be assigned to a cluster and when the process ends, the training performance tab will be displayed. Here, you can evaluate the Accuracy and the Loss of your model, the right and wrong responses provided by our model after it was fed the previously acquired data set, in a tabulated form.

Moreover, you can see the Data explorer that offers an intuitive representation of the classification and underneath it, the predicted on-device performance of the NN.

Upload the Impulse via USB Cable

You will notice that another menu pops up that allows you to opt in if you want to enable EON Compiler. We will get back to this later, for now click Build and wait for the process to end. Once it’s done, download the .hex file and follow the steps in the video that shows up to upload it on the Thingy:53 board.

With the impulse uploaded, connect the board to your computer, launch a terminal and issue the following command to see the results of the inferencing:

edge-impulse-run-impulse

Upload the Impulse via Android/IoS App

Another way of deploying the model on the edge is using the Nordic nRF Edge Impulse app for iPhone or Android:

Download and install the app for your Android/IoS device.
Launch it and login with your edgeimpulse.com credentials.
Select your project from the list

Navigate to the Devices tab and connect to the Thingy:53

Navigate to the Data tab and press connect. You will see the status on the button changing from Connect to Disconnect.

Navigate to the deployment tab and press deploy.

In the inferencing tab, you will see the results of the Edge Impulse model you have flashed on the device:

Conclusion

In this article, we have described how to create a glass/window breaking detection system using the Nordic Thingy:53™ development board and Edge Impulse Machine Learning platform. This system can be used in office buildings or other commercial settings to help improve safety and security. We believe that this approach has several advantages over existing solutions, including its low cost, ease of use, and accuracy. With further development, this system could be expanded to include other types of sensors (e.g. cameras) to improve detection accuracy or be used in other applications such as door/window opening detection or intruder detection.

Illegal Logging Detection - Nordic Thingy:53

Use TinyML to listen for the sound of trucks amongst forest noise, using a Nordic Thingy:53.

Created By:

Public Project Link:

Introduction

Illegal logging is a major environmental issue worldwide. It has been estimated that it accounts for up to , and is responsible for the loss of billions of dollars worth of valuable timber each year. When timber is exploited illegally, governments lose much-needed money, particularly in developing countries. In addition to this, illegal logging severely impacts biodiversity and it can lead to soil erosion, decreased water quality, and habitat loss for wildlife. Furthermore, illegal logging is frequently associated with organized crime groups and can serve as a source of funding for rebel or terrorist groups.

Due to the vastness of forested regions, it is difficult to identify unauthorised logging activities, which frequently occur in isolated and difficult-to-reach locations and traditional approaches, such as ground patrols, are frequently ineffective.

One way to combat this problem is through the use of AI algorithms that can be deployed on battery-powered devices, such as sensors near the forest on roads frequented by the trucks transporting the wood. Machine learning algorithms are well suited for this task as they can be trained to recognize the characteristic sounds made by logging trucks. When deployed on the roads near forests, these sensors can provide a real-time alert when a logging activity is detected, allowing law enforcement to quickly respond.

The Challenge

One challenge posed by this approach is that sensors must be able to distinguish between different types of logging truck noises and the background noise in the forest. Another challenge is that the devices must be ruggedized to withstand the harsh environment of the forest. We will address both of these challenges by using the Nordic Thingy:53, a multi-sensor prototyping platform for wireless IoT and embedded machine learning, which will be used to train a ML algorithm, and is encased in a tough polymer casing that can withstand drops and impact.

Our Solution

Our approach to this problem is to create an IoT system based on the Nordic Thingy:53 platform that will run a machine learning model trained using the Edge Impulse platform that can detect the sound of timber trucks.

The Nordic Thingy:53 is a versatile, low-power device that is well suited for this application. Its two Arm Cortex-M33 processors' computing capability and memory capacity allow it to execute embedded machine learning (ML) models directly on the device. It features a microphone for audio input in addition to several other integrated sensors, such as an accelerometer, gyroscope, and magnetometer, as well as sensors for temperature, humidity, air quality, and light level. The Thingy can be powered by a rechargeable Li-Po battery with a 1350 mAh capacity that can be charged via USB-C, making it ideal for use in remote locations.

Hardware requirements

USB-C cable

Software requirements

Edge Impulse account
Edge Impulse CLI
Nordic nRF Edge Impulse App

Hardware Setup

Our choice of Edge computing hardware for this use case is the Nordic Thingy:53, is based on Nordic Semiconductor’s flagship dual-core wireless SoC, the nRF5340. The SoC's Arm Cortex-M33 CPU application core assures that the Thingy:53 can handle heavy computational workloads of embedded machine learning without interfering with the wireless communication. The application core is clocked at 128 MHz for maximum speed, with 1 MB of flash storage and 512 KB RAM to fit your programs. Wireless communication is handled independently by another Arm Cortex-M33 core clocked at 64 MHz for more power efficient operation and without using any computing resources from the application core. The Bluetooth Low Energy (LE) radio provides firmware updates and communication through Bluetooth LE, as well as additional protocols such as Bluetooth mesh, Thread, Zigbee, and proprietary 2.4 GHz protocols.

Software Setup

Creating an Edge Impulse Project

Let's start by creating an Edge Impulse project. Select Developer as your project type, click Create a new project, and give it a suggestive name.

Connecting the Device

Before connecting it to the Edge Impulse project, the firmware of the Thingy:53 must be updated. Download the nRF Programmer mobile application and launch it. You will be prompted with a number of available samples.

Then, go to Devices -> Connect a new device in your Edge Impulse project, choose Use Your Computer, and allow access to your microphone.

Select the Edge Impulse application, select the version of the sample from the drop-down menu and tap Download.

When that is done, tap Install. A list with the nearby devices will appear and you must select your development board from the list. Once that is done, the upload process will begin.

With the firmware updated, connect the Thingy:53 board to a computer that has the edge-impulse-cli suite installed, turn it on, launch a terminal and run:

edge-impulse-daemon --clean

You will be required to provide your username and password before choosing the project to which you want to attach the device.

Edge Impulse serial daemon v1.14.10
? What is your user name or e-mail address (edgeimpulse.com)? <your user>
? What is your password? [hidden]

Once you select the project and the connection is successful, the board will show up in the Devices tab of your project.

Building the Dataset

ffmpeg -i DaytimeForest_NatureAmbience.wav -f segment -segment_time 3 -c copy output%09d.wav

Make sure to replace DaytimeForest_NatureAmbience.wav with the name of your file.

Now go to Data acquisition > Upload data on Edge Impulse and upload your samples, making sure to label them accordingly. Our two labels are EngineSounds and Background. The difference between the two classes should be clearly observed in the sound waveform, as seen in the following pictures:

Designing the Impulse

Now that the data is available, it’s time to create the Impulse. The functional Block of the Edge Impulse ecosystem is called "Impulse" and it fundamentally describes a collection of blocks through which data flows, starting from the ingestion phase and up to outputting the features.

The setup is rather straightforward for this use case. We will be using a 2000ms window size, with a window increase of 200ms at an acquisition frequency of 100Hz. For the processing block we will be using an Audio (MFE) block and for the Learning block, we will be employing a basic Classification (Keras) block.

Configuring the Audio Features Block

A spectrogram is a display of the MFE's output for a sample of audio on the right side of the page. The MFE block converts an audio window into a data table, with each row representing a frequency range and each column representing a time period. The value contained within each cell reflects the amplitude of its related frequency range during that time period. The spectrogram depicts each cell as a colored block, with the intensity varying according to the amplitude.

A spectrogram's patterns reveal information about the sort of sound it represents. In our case, the spectrogram below depicts a pattern characteristic of forest background noise:

This spectrogram depicts a pattern characteristic of logging trucks engine sounds and the differences between this spectrogram and the above one can be easily observed:

You can use the default values for configuring the MFE block as they work well for a wide range of applications. Click on Save parameters and you’ll be prompted to the feature generation page. After you click on Generate features you’ll be able to visualise them in the Feature explorer. Generally, if the features are well separated into clusters, it means the ML model will be able to easily distinguish between the classes.

Configure the Neural Network

The training will be assigned to a cluster and when the process ends, the training performance tab will be displayed. Here, you can evaluate the Accuracy and the Loss of the model, the right and wrong responses provided by the model after it was fed the previously acquired data set, in a tabulated form.

Moreover, you can see the Data explorer that offers an intuitive representation of the classification and underneath it, the predicted on-device performance of the NN.

Model Testing and NN Optimization

To quickly test the performance of your NN, navigate to the Model testing tab, and click on Classify all. This will evaluate how well the model will perform on never seen data. This is a great practice to avoid overfitting the model on the training data.

Upload the Impulse via USB Cable

You will notice that another menu pops up that allows you to opt in if you want to enable EON Compiler. For now, click Build and wait for the process to end. Once it’s done, download the .hex file and follow the steps in the video that shows up to upload it on the Thingy:53 board.

With the impulse uploaded, connect the board to your computer, launch a terminal and issue the following command to see the results of the inferencing:

edge-impulse-run-impulse

Upload the Impulse via Android/IoS App

Another way of deploying the model on the edge is using the Nordic nRF Edge Impulse App for iPhone or Android:

Download and install the app for your Android/IoS device.
Launch it and log in with your edgeimpulse.com credentials.
Select your Illegal Logging Detection project from the list

Now deploy your device in an area that you want to monitor and receive the notifications of passing trucks on your phone. In our next section we will explore mesh network capabilities and connectivity options.

Future Development

The Nordic Thingy:53 is equipped with a Dual-core Bluetooth 5.3 SoC supporting Bluetooth Low Energy, Bluetooth mesh, NFC, Thread and Zigbee, which makes it a great choice for creating edge applications that use bluetooth communication as an output. In this case, Edge Impulse platform allows its users to deploy their Impulse as a library containing all the signal processing blocks, learning blocks, configurations, and SDK required to integrate the ML model in your own unique application.

In order to deploy the model as a sensor in the forest, a mesh network can be used to establish connections between various sensors, called nodes. Bluetooth mesh networks are well suited for applications that require a large coverage area. The data collected by the sensors can be transmitted wirelessly to a central location, from which an alert can be sent. Having a Bluetooth mesh network in place is more efficient than having to physically retrieve the sensor data. Furthermore, this network topology provides redundancy and resistance to failure as all nodes are interconnected and any node can act as a relay if necessary. Consequently, using a Bluetooth mesh network is an efficient way to wirelessly collect sensor data over a large coverage area.

Conclusion

Though it is often overlooked, illegal logging is a significant global problem. It results in the loss of valuable timber each year, and contributes to deforestation and climate change. Fortunately, machine learning algorithms offer a promising solution to this problem. By providing real-time monitoring, these algorithms have the potential to significantly reduce the amount of valuable timber lost each year to illegal logging, and the Nordic Thingy:53 is a powerful tool to achieve this. With this system in place, we can help to preserve our forests and ensure that they are managed in a sustainable way.

Illegal Logging Detection - Syntiant TinyML

Use TinyML to listen for the sound of illegal logging, and send SMS notifications if a chainsaw is heard.

Created By: Zalmotek

Public Project Link:

https://studio.edgeimpulse.com/studio/126774

Introduction

Illegal logging is a major environmental issue worldwide. It not only destroys forests but also decreases the amount of available timber for legal purposes. In addition, illegal logging often takes place in protected areas, which can damage ecosystems and jeopardize the safety of people and wildlife.

One way to combat this problem is through the use of machine learning algorithms that can detect chainsaw noise and be deployed on battery-powered devices, such as sensors in the forest. This allows for real-time monitoring of illegal logging activity, which can then be quickly addressed.

The Challenge

Detecting illegal logging is therefore essential for both environmental and economic reasons. However, it is difficult to detect illegal logging activity due to the vastness of forested areas, as it often takes place in remote and hard-to-reach areas. Traditional methods such as ground patrols are often ineffective, and satellite imagery can be expensive and time-consuming to analyze.

When combined with satellite data, ML can be used to quickly and cost-effectively identify areas where illegal logging is taking place. By deploying sensors in the forest, they are able to detect illegal logging activity in real time. In addition, by providing data on where illegal logging is taking place, they are helping to better target enforcement efforts. This information can then be used to deploy ground patrols or take other actions to stop illegal activity. In this way, ML can play a vital role in protecting forests and ensuring that timber is harvested legally.

Our Solution

Our approach to this problem is to create an IoT system based on the TinyML Syntiant development board that will run a machine learning model trained using the Edge Impulse platform that can detect the sound of chainsaws and send a notification via SMS when this event is detected.

We have picked the Syntiant TinyML board as our specialized silicon that can easily run the machine learning algorithm thanks to its Neural Decision Processors (NDPs). Moreover, having an onboard PH0641LM4H microphone makes it great for quickly prototyping audio-based projects. This board is ideal for building and deploying embedded Machine Learning models, as it is fully integrated with Edge Impulse and it has very low power consumption, advantageous for this use case as we do not want to charge the battery often.

Because this device must be deployed in remote and difficult-to-reach locations, a fitting power source must be used and so, we have opted to power it from a 2500 mAh rechargeable Li-Ion battery.

We will use a publicly available chainsaw noise dataset and the Edge Impulse platform to train and deploy a model that can distinguish between this sound and other similar sounds.

Hardware requirements

Syntiant TinyML board
Micro USB cable
SIM800L GSM module
ESP32
ESP32 programmer
3.3V power supply (LD1117S33)
Power bank: rechargeable Li-Ion battery (INR18650-29E6 SAMSUNG SDI) + charger
Consumables (wires, prototyping board, LED)
Enclosure (carved stone)

Software requirements

Edge Impulse account
Edge Impulse CLI
Arduino IDE
Arduino CLI
Git

Hardware Setup

As stated before, our choice of Edge computing hardware for this use case is the Syntiant TinyML board, designed for working with Syntiant’s Neural Decision Processors (NDPs). This development board is a powerful tool for developing and deploying Machine Learning algorithms, as it is equipped with an ARM Cortex-M4 processor, which allows for real-time inference of the trained Machine Learning model. Another advantage of this Syntiant board is that it is equipped with 5 GPIOs that enable it to interact with other external circuitry and trigger an external output.

Over the ML detection layer of our system, we will be adding a communication layer. In this case, we are using an external circuit based on the ESP32 MCU that can read a trigger from Syntiant TinyML, which is further interfaced with a SIM800L module, responsible for sending an SMS message to the user when chainsaw noise is detected.

Be advised that the SIM800L is using 2G networks, in the EU there are still many providers for it and the coverage is way better for it in comparison with the speedier networks. Since we are deploying it in a forest it's actually a good option for us. In the US the 2G networks are being phased out currently so be sure to check what GPRS module works best in your case.

In order to camouflage the device, we can fit all of the electronics into a fake stone, making it less likely to be detected. This will make it difficult for illegal loggers to find and destroy the device, as they will not be able to see it.

Since we want to fit all the modules inside the rock, we’re using the smaller SMD ESP32-WROOM-32UE SoC instead of the ESP32 development board, so we also have to use a breakout board to program the board.

To power the system we’re using a power bank based on the rechargeable INR18650-29E6 Li-Ion battery, from which we can obtain both 5V for the Syntiant TinyML board and 3.8V for the SIM800L GSM module. We also have to use a 5V to 3.3V voltage regulator to power the ESP32. You can find all the details in the full wiring schematic presented below.

NOTE: The Syntiant TinyML board has no power voltage input (the 5V pin does not power up the board and by looking at the schematic we can see it's actually before a voltage regulator that draws from the USB power supply). We had to resort to soldering a small wire to a 5V test pad on the bottom of the board that luckily powers the board.

Because of the space constraints, we opted for a quick point-to-point wiring solution by directly soldering the components for our proof of concept. Of course, after further testing, a printed circuit board could be designed to accommodate all modules and provide a direct plug for the battery making it easier to produce in larger numbers in an automated fashion.

Software Setup

To connect the Syntiant TinyML to Edge Impulse, download the Audio firmware archive. Put the board in boot mode by double-clicking on the reset button when connecting the board to your computer, while the orange LED is blinking. In boot mode, you should see the red LED fading on and off. Then flash the firmware by running the script for your OS from the archive:

# for macOS and Linux  
./arduino-build.sh --flash
# for Windows
arduino-win-build.bat --flash

Creating an Edge Impulse Project

Let's start by creating an Edge Impulse project.

Register an account with Edge Impulse. Select Developer as your project type, click Create new project, and then give it a memorable name.

Afterward, select Accelerometer data as the type of data you will be dealing with.

Data Collection

Create a new Edge Impulse project, give it a meaningful name, and select Developer as your desired project type. Afterward, select Audio as the type of data you wish to use. To collect audio data from the microphone, connect the board to your computer. Once plugged in, the Syntiant TinyML Board shows up as a USB microphone and Edge Impulse can use this interface to record audio directly. Make sure the board is selected instead of your default microphone in your audio input settings.

Then, go to Devices -> Connect a new device in your Edge Impulse project, choose Use Your Computer, and allow access to your microphone.

Building the Dataset

Sometimes, gathering relevant data for your TinyML model might be a complicated endeavor, just like in this case. This is a situation in which the Upload data function under the Data acquisition tab saves the day. What it allows you to do is use publicly available recordings of the phenomena you wish to identify and use them as training data for your model.

In this case, what you must do is find publicly available recordings of chainsaws (in WAV format) and download them. Navigate to the Upload data menu, click on Choose Files and select your files, select Training as the upload category and manually enter the "Chainsaw" Label and click Begin upload.

It’s important to upload data that corresponds to background or unknown sounds because the NDP101 chip expects one and only negative class and it should be the last in the list, that we will be naming Z_Background.

For this, we will be using the Noise and Unknown dataset that can be found in the Keyword Dataset, which we will label Z_Background.

The next thing on the list is the data formatting. The data uploaded from the Keyword Dataset is split into 1-second long recordings, while the Chainsaw sounds are usually longer, over 30 seconds long. To make things uniform, we will select one Chainsaw data entry, click on the menu marked by the 3 vertical dots, and press on Split Sample. We will leave the segment length to 1000ms and add additional segments if needed. Click on Split and then do this for every chainsaw sample we have uploaded.

Once this is done, it’s time to do the Train/Test Split. Just click on the triangle with an exclamation mark on it, in the Train/Test Split section and Edge Impulse will do the job for you. An ideal ratio would be 80%-20%.

Designing the Impulse

Once the Data acquisition is over, it’s time to build the Impulse.

You will notice the odd window size. Make sure to leave it at the default 968ms because this is a hardware constraint of the NDP101 chip.

For the processing block, we will be using Audio (Syntiant). This will compute log Mel-filterbank energy features from the audio signal that is fed in it and as for the learning block, we will be employing a Classification(Keras) block.

Once you are happy with the set-up, click on Save Impulse.

Configuring the Digital Processing Block

What this submenu does is let you explore the raw data, and see the signal processing block’s results.

Navigate to the "Syntiant" submenu under Impulse Design, leave the parameters on their default value and click on Save parameters, and then on Generate Features.

One of the most effective tools provided by Edge Impulse is the Feature Explorer. It enables visual, intuitive data exploration that enables you to rapidly verify whether your data separates neatly, even before starting to train your model. If you're looking to identify the outliers in your dataset, this feature is fantastic because it color-codes comparable data and enables you to track it back to the sample it originated from by just clicking on the data item.

Configure the Neural Network

In this stage, we may provide a number of parameters that affect how the neural network trains under the NN Classifier tab in the Impulse Design menu. The Training set can now be kept at its default setting. When you click the Start Training button, take note of how a processing cluster is chosen to host the training process.

What is worth noting here is the particular structure of the neural network. To be more precise, we need to use a 3-layer NN architecture, every layer consisting of 256 neurons to be able to run our model on the NDP101 chip.

Once the program is finished, you will see the training output displayed. A degree of accuracy of more than 95% is what we aim for. The accurate and incorrect responses that our model provided after being fed the previously gathered data set are shown in tabular form in the Confusion matrix just below it. In this case, the dataset was clean enough that the model exhibits a 100% Accuracy rating.

Model Testing

Navigating to the Model Testing page is a wonderful method to start testing our model before deployment. The samples kept in the Testing data pool will be displayed to you. To run all of this data through your impulse, select Classify all.

Before making the effort to deploy the model back on the edge, the user has the opportunity to test and improve the model using the model testing tab. When building an edge computing application, the ability to go back and add training data, modify the DSP and Learning block, and fine-tune the model reduces development time significantly. We strongly recommend taking the time to go back, tweak and optimize your model before deploying it on the edge.

To check how the model behaves when deployed on the edge, navigate to the Deploy tab, click on Syntiant TinyML under the Build firmware section, select the posterior parameters as the audio events we wish to detect, and press build.

Edge Impulse will then compile a pre-built binary that can be deployed in the same manner as the data-forwarder used in the data acquisition phase.

With the model uploaded on the board, you can use any serial monitor, like picocom or the serial monitor embedded in Arduino IDE to debug and evaluate the performance of the Impulse.

BIN File Loaded correctly from Serial Flash Setup for audio done
Loaded configuration
Inferencing settings:
	Interval: 0.0625 ms.
	Frame size: 15488
	Sample length: 968 ms.
	No. of classes: 2
Starting inferencing, press 'b' to break
Predictions:
    Chainsaw: 	0
    Z_Background: 	1

Deploying the Model on the Edge

Once you are happy with your model, it's time to deploy it on the device.

For our use case, we will be using our custom machine learning model alongside the firmware provided by Edge Impulse to create a custom application that will trigger an output when Chainsaw sounds are detected.

First and foremost, navigate to the Deployment tab, select Syntiant NDP101 library, and then click on "Find posterior parameters". This is a particularity of the NDP101 architecture that allows the user to set the thresholds for words or other audio events at which a model activates.

Select "Chainsaw" as the audio event that you want to detect and press Find parameters. Once that is done, you are ready to press Build.

The task will be assigned to a computing cluster and when the job finishes, you will be able to download it locally.

Afterward, clone the Syntiant firmware repository and copy and paste the model files into the "src" folder.

Open the "firmware-syntiant-tinyml.ino" in Arduino IDE, without creating a folder for it, and modify the on_classification_changed function like so:

#include "src/syntiant.h"
#include <NDP.h>
#include <NDP_utils.h>
#include <Arduino.h>

void on_classification_changed(const char *event, float confidence, float anomaly_score) {
    if (strcmp(event, "Chainsaw") == 0) {
        OUT_1_LOW();
        delay(10000);
    }else{
        OUT_1_HIGH();
    }
}
void setup(void)
{   syntiant_setup();
    OUT_1_HIGH();
}
void loop(void)
{
    syntiant_loop();
}

Save and close.

Next up, launch a terminal, navigate to the firmware-syntiant-tinyml folder and run:

./arduino-build.sh --build

Once the firmware successfully builds, connect the Syntiant board to your computer, place it in Boot mode and run:

./arduino-build.sh --flash

If everything goes smoothly, the firmware will be flashed on your Syntiant TinyML board and when your board detects Chainsaw sounds, it will light up the LED in the color red.

Sending SMS Notifications

Now that you have a Syntiant TinyML board that is able to detect chainsaw noises, you can program the ESP32 board to send an SMS notification through the SIM800L GSM module whenever the sound is detected. The GSM module is connected to the UART2 of the ESP32 microcontroller. Open Arduino IDE and paste the code below, then adjust the phone number and upload it to your ESP32 board:

// code adapted from: https://circuitdigest.com/microcontroller-projects/interfacing-sim800l-module-with-esp32

void setup() {
  Serial.begin(9600);
  Serial2.begin(9600);
  delay(3000);
  pinMode(16, INPUT);
  pinMode(18, OUTPUT);
  digitalWrite(18, LOW);
  test_sim800_module();
}

void loop() {
  updateSerial();
  if (digitalRead(16) == HIGH) {
    digitalWrite(18, HIGH);
    send_SMS();
    delay(1000);
  }
  digitalWrite(18, LOW);
}

void test_sim800_module() {
  Serial2.println("AT");
  updateSerial();
  Serial.println();
  Serial2.println("AT+CSQ");
  updateSerial();
  Serial2.println("AT+CCID");
  updateSerial();
  Serial2.println("AT+CREG?");
  updateSerial();
  Serial2.println("ATI");
  updateSerial();
  Serial2.println("AT+CBC");
  updateSerial();
}

void updateSerial() {
  delay(500);
  while (Serial.available()) {
    Serial2.write(Serial.read()); // Forward what Serial received to Software Serial Port
  }
  while (Serial2.available()) {
    Serial.write(Serial2.read()); // Forward what Software Serial received to Serial Port
  }
}

void send_SMS() {
  Serial2.println("AT+CMGF=1"); // Configuring TEXT mode
  updateSerial();
  Serial2.println("AT+CMGS=\"+xxxxxxxxxxx\""); // change ZZ with country code and xxxxxxxxxxx with phone number to sms
  updateSerial();
  Serial2.print("Illegal logger detected!"); // SMS
  updateSerial();
  Serial.println();
  Serial.println("Message Sent");
  Serial2.write(26);
}

Conclusion

In conclusion, illegal logging is a major environmental issue that can be effectively combated through the use of machine learning algorithms. The system described in this article provides a way to quickly and easily monitor large areas of forest for illegal logging activity, and take action to stop it. The Syntiant TinyML board is an efficient and robust platform for running machine learning models and can be used to quickly and easily detect illegal logging activity. With this system in place, we can help to preserve our forests and ensure that they are managed in a sustainable way.

If you need assistance in deploying your own solutions or more information about the tutorial above please reach out to us!

Wearable Cough Sensor and Monitoring - Arduino Nano 33 BLE Sense

An exploration into using machine learning to better monitor a patient coughing, to improve medical outcomes.

Created By: Eivind Holt

Public Project Link: https://studio.edgeimpulse.com/public/105885/latest

GitHub Repository: https://github.com/eivholt/cough-monitor

Project Demo

Intro

This wearable device detects and reports user's coughs. This can be useful in treatment of patients suffering from chronic obstructive pulmonary disease, COPD, a group of diseases that cause airflow blockage and breathing-related problems. The increase in number and intensity of coughs can indicate ineffective treatment. Real-time monitoring enables caregivers to intervene at an early stage.

Advantages over existing solutions

Using cough frequency and intensity as and indicator of COPD condition has proven useful, but impractical to monitor over time outside hospital wards using existing technology. The main traditional approach consists of audio recording a patient at ward, then using manual or software based cough counting. While developing this new proof-of-concept no similar approaches were found.

Privacy

Existing methods of analyzing audio recordings greatly invades privacy of the patient, caregivers and peers. This proof-of-concept does not store any audio for more than a fraction of a second. This audio buffer never leaves the device, it is constantly being overwritten as soon as the application has determined if the small fragment of audio contains a cough or not. In fact, the hardware used is not capable of streaming audio using the low-energy network in question.

Further, the application is hard-coded to detect coughs or noise. To be able to detect new keywords, for instance "bomb", or "shopping", the device would have to be physically reprogrammed. Firmware Over-the-Air is not currently supported in this project. Each keyword consumes already constrained memory, limiting the practical amount of different keywords to a handful.

Compared to commercial voice assistants, such as Google Nest, Amazon Alexa or Apple Siri on dedicated devices or on smartphone, this device works a bit differently. The aforementioned products are split into two modes: activation and interpretation. Activation runs continuously locally on the device and is limited to recognizing "Hey google" etc. This puts the device in the next mode, interpretation. In this mode an audio recording is made and transmitted to servers to be processed. This opens up for greatly improved speech recognition. It also opens up to secondary use, better know as targeted advertisement. The device in this project only works in the activation mode.

Hardware, software, tools and services used

Arduino Nano 33 BLE Sense
LiPo battery
JSH battery connectors
Edge Impulse Studio
VS Code/Arduino IDE
Nordic Semiconductor nRF Cloud
nRF Bluetooth Low Energy sniffer
Nordic nRF52840 Dongle
Fusion 360
3D printer
Qoitech Otii Arc

How it works

A model was trained to distinguish intense coughs from other sounds. An Arduino Nano 33 BLE Sense was programmed to continuously feed microphone audio into an application. The application then runs inference on small audio fragments to determine the probability of this fragment containing a cough. If it does, a counter is incremented and this is securely advertised using Bluetooth Low-Energy, BLE. An other BLE device, such as a smartphone or a USB-dongle, can be paired with the device and re-transmit the event to a service on the internet. I have used Nordic nRF Cloud for this purpose. nRF Cloud exposes several APIs (REST web services, MQTT brokers) that enables the events to be integrated with other systems. With this as a basis it's possible to transform the event into an internationally clinically recognizable message that can be routed into an Electronic Medical Record system, EMR. Popular standards include openEHR and HL7 FHIR.

Edge Impulse

Edge Impulse is the leading development platform for machine learning on edge devices and it's free for use by developers. The documentation is some of the best I have experienced in my two decades as a professional developer. I also wish to recommend the book TinyML Cookbook by Gian Marco Iodice as a practical, project based introduction to TinyML and Edge Impulse. I will highlight some particulars of my application. You may explore my project here.

A model was trained using 394 labeled audio samples of intense coughs, a total of 2 minutes and 34 seconds. An almost equal amount of audio samples of less intense coughs, sneezes, clearing of throat, speech and general sounds was also labeled, 253 samples, 2 minutes and 38 seconds. All samples were captured using the Arduino Nano, positioned at the intended spot for wear.

My coughs lasts around 200 milliseconds. I sampled 10 seconds of repeated coughing with short pauses, then split and trimmed the samples to remove silence.

I am the only source of the coughs, if this is to be used by anyone else a significantly larger and more diverse dataset is needed. I have found several crowd-sourced datasets of coughs, thanks to efforts during the covid pandemic. I started this project by making Python scripts that would filter, massage and convert these samples. The quality of many of the samples were not suitable for my project, many 10-second samples only contained a single, weak cough. Very few were accurately labeled. Due to the amount of work required to trim all of these and to manually label each sample, I decided to produce my own. Labeled datasets of environmental audio recordings are also readily available.

The samples were split for training (81%) and 19% were put aside for testing. The training samples were used to extract audio features and create a neural network classifier. The NN architecture and parameters were mainly the result of experimentation. Tutorials and books gave conflicting and outdated advice, but were still useful for understanding the different steps. Edge Impulse Studio is perfect for this type of iterative experimentation, as it replaces a lot of custom tooling with beautiful UI.

The model was tested using data set aside and yielded great results. I used EON Tuner in Edge Impulse Studio to find optimal parameters for accuracy, performance and memory consumption.

An Arduino compatible library was built and used to perform continuous interference on an Arduino Nano 33 BLE Sense audio input.

Programming Arduino application

The Arduino ecosystem is wonderful for this kind of explorative prototyping. Setting up the Arduino Nano for programming using Visual Studio Code or Arduino IDE/web IDE was a breeze and access to e.g. BLE-APIs was intuitive. You may explore the source code here.

I followed some samples on how to use the generated Arduino libraries from Edge Impulse and how to perform inference on the audio input. If attempting to build my source code, make sure to include the /lib folder. I had to experiment a bit with parameters on the length of the audio window and slices. As each audio sample might start and end in any number of places for a given cough, each piece of audio is analyzed several times, preceeding and following adjoining samples. The results of the inference, the classification, is checked and triggers cough increment if probability is above 50%. A LED is flashed as an indicator.

If you are used to RPC or even REST types of communication paradigms, BLE will require a bit of reading and experimentation. The docs over at Arduino give great explanation of key concepts and sample code to get started. Nordic also have insightful webinars on BLE.

In short my application defines a custom BLE service, with a characteristic of type unsigned integer, with behaviors Notify, Read and Broadcast. Not very sophisticated, but enough for demonstration. Any connected device will be able to subscribe to updates on the value.

I used a Nordic nRF52840 Dongle in conjunction with nRF Connect for Desktop Bluetooth Low Energy sniffing app for initial BLE development.

Nordic nRF Cloud

Next I used the nRF Android app on my phone as a gateway between the device and nRF Cloud.

Energy consumption profiling

I didn't spend a whole lot of time profiling and optimizing this project, as I would be moving to different hardware in the next iteration. Remember, the current implementation is simultaneously buffering audio from the microphone and performing inference. The key to long battery life is 1) energy efficient hardware and 2) as much down time (deep sleep) as possible. I did however make sure it could perform continuous inference for a few days. The Otii Arc by Qoitech is an excellent tool for profiling projects like this. Please see my other projects at Hackster and element 14 for more in-depth tutorials.

Energy source

I used lithium polymer batteries for compact size and ease of recharging. I only had spare 500 mAh batteries available, shipping options for assorted batteries by air is limited. To extend battery life I connected two in parallel by soldering 3 JSH female connectors. Warning: This wiring is subjectable to short circuit and is only connected under supervision. This gives twice the capacity while keeping the voltage at the same level.

Electronics work

I made the mistake of assuming I would have to connect more components to the Arduino Nano via a protoboard. On a whim I ordered the Nano with pre-soldered headers. This only took up space and I had to undergo the tedious work of removing the headers by hand using a regular soldering iron. Sacrificing the headers by snipping them every other pin greatly eased the required finger acrobatics.

The only other thing I did was solder a female JSH battery connector to pins VIN and GND. This would serve as my battery connection, and subsequently the device's on/off toggle.

Physical prototype

I wanted to make a prototype for demonstrating the concept for clinicians. It needed to contain and protect the electronics and batteries, while allowing sound waves to reach the microphone. I realized this would complicate making the enclosure watertight and quickly crossed that off the list. I also wanted to make a practical mechanism for securing the device on the wearer.

I used Autodesk Fusion 360 as CAD to design the enclosure. I always start by making rough digital replications of the hardware using calipers to take measurements on a sketch.

This gives the driving constraints and allows me to experiment with different hardware layouts without having to totally scrap alternatives.

While designing I constantly need to take into consideration the manufacturing method, in this case a resin-based SLA 3D printer. When drawing I have to decide on the orientation of the model during printing to avoid complicating overhangs, cupping, hard-to-reach surfaces for removing support material. I also want to reduce the number of parts, to avoid unnecessary post-print work and bad fits.

The model was printed using a Formlabs 3 SLA 3D printer, with rigid Formlabs Gray v4 resin. The process starts by exporting the models as high definition .stl files from Fusion 360. These are imported and arranged for printing using the PreForm software. There are many considerations in arranging the models for optimal print, carefully oriented parts and support material placement can drastically save post-print work, increase strength and surface finish.

Completed prints undergo an IPA wash to remove excess resin and finally post cure in a UV-light chamber. What remains is to snip off support material, sand any uneven surfaces and glue together parts. Now the device could finally be assembled and tested.

I ended up with a sort of a badge with a clip and a friction fit lid. It reminds me of a 1960's Star Trek communicator, not the worst thing to be compared against.

Limitations

Battery life is limited to a few days. I am in the process of reimplementing the device using a Neural Decision Processor, NDP, that is able to perform the inference with a fraction of the energy a conventional MCU requires.

I tried to limit audio inference to only perform when the Arduino Nano accelerometer triggers due to some amount of movement (chest movement during a cough). I was disappointed to discover that the interrupt pin on the LSM9DS1 IMU is not connected to the MCU.

You might also have realized that the device will pick up coughs by bystanders, something I discovered when demonstrating the device to a large audience during a conference! Limiting activation to both movement and audio will sort this out.

Future improvements

When demonstrating the device to doctors and nurses I received a great suggestion. A COPD patient that stops taking their daily walk is a great source for concern. My device could be extended to perform monitoring of physical activity using accelerometer data and report aggregated daily activity.

It might be useful to support simple keywords so a patient could log events such as blood in cough, types of pain, self-medication etc.

I plan to move from BLE to LoRaWAN or NB-IoT for transmissions. This way patients won't have to worry about IT administration or infrastructure, it will just work. Please see my other projects at Hackster and element 14 for demonstrations of these lpwan technologies.

Reception

I have had the opportunity to demonstrate the device to clinicians both in person and at expositions and it has received praise, suggestions for further features and use in additional conditions. This project has also spawned several other ideas for wearables in e-health.

Disclosure

I work with research and innovation at DIPS AS. I am a member of Edge Impulse Expert Network. This project was made on my own accord and the views are my own.

Collect Data for Keyword Spotting - Raspberry Pi Pico

Collect audio data for your machine learning model on a Raspberry Pi Pico.

Created By: Alex Wulff

Public Project Link: https://studio.edgeimpulse.com/public/117150/latest

Project Overview

Keyword spotting is an important use case for embedded machine learning. You can build a voice-activated system using nothing but a simple microcontroller! Since this use case is so important, Edge Impulse has great documentation on best practices for keyword spotting.

Edge Impulse also now supports the Raspberry Pi Pico. This is fantastic because, at $4, Pico is a very capable and low-cost platform. And now, with machine learning, it can power tons of projects. The only problem is that Edge Impulse's firmware for Pico only supports direct data collection at a low sample rate. Audio waveforms vary relatively quickly—therefore, we need some way of collecting data at a rapid sample rate and getting it into Edge Impulse. That's the purpose of this project!

This project is not just useful for audio; any Pico project that needs a higher sample rate can use this same code.

Machine Learning and Data Collection

Most machine learning applications benefit from using training data collected from the system on which you will perform inferencing. In the case of embedded keyword spotting, things are no different. Subtle variations between device microphones and noise signatures can render a model trained on one system useless on another. Therefore, we want to collect audio data for training using the very circuit that we'll use to perform inferencing.

The Raspberry Pi Pico can actually collect data at an extremely high sample rate. In my testing, you can achieve sample rates of up to 500 kHz. Pretty neat! For audio, of course, a much lower sample rate is sufficient. Lossless audio is sampled at up to 44 kHz (twice the maximum frequency that human ears can hear thanks to the Nyquist sampling theorem), but for keyword spotting we can get away with something as low as 4 kHz. Lower sample rates are generally better because they decrease the computational burden on our inferencing platform; if we go too low, we start to sacrifice audio fidelity and spotting keywords becomes impossible.

Getting Data to Your Computer

So — Pico is collecting 4,000 samples every second. How do we actually save these off into a useful format? If we had an SD card hooked up to Pico we could simply write the data to that. I want this project to be entirely self-contained, however, so that idea is out. Pico does have 2 MB of onboard flash, but this would fill up relatively quickly and it's possible to have some messy side effects like overwriting your program code.

This leaves us with one option — the serial port. We can dump data over the serial port where it can be read and saved off by the host computer. In my testing, we can only push out a few kB per second of data this way, but it is sufficient for audio data. There's also the added complication that most convenient serial interfaces are text-based. We can't simply send raw bytes over serial, as these may be interpreted as control characters such as line returns that won't actually be saved off. Instead, we can encode the data into a text format such as base-64 and then decode it later.

Another program can convert the raw bytes into a usable audio file once the data is saved off on the computer.

The Circuit

All you need for this project is a microphone for data collection (and a Pico, of course)! Check out the circuit above for how to hook the microphone up to Pico. I recommend this microphone from Adafruit. It has automatic gain control, which means that as you get further from the microphone the module will automatically turn up the gain so you don't get too much quieter. This is essential for a project where you might be a varying distance away from the microphone.

Pico Code

You can find all the code for this project here. It compiles using the standard Pico CMake and make procedure. If you're not sure how this process works, look through Raspberry Pi's tutorials for getting set up with the Pico C++ SDK and building/flashing your code.

The main code is in pico_daq.cpp. The program starts the Pico's ADC sampling routine, normalizes the data and converts it to floats, converts the float data to base-64, and then prints this out over the serial console to the host computer.

This is where we encounter our first problem — at 4 kHz, with four bytes per floating point value, the serial port cannot write data as fast as we collect it. The way I handle this problem in this code is by dropping chunks of samples. While we're busy sending the rest of the data that could not be sent during the sampling window, the ADC is not collecting data. This leads to some jumps within the final audio file, but for the purposes of collecting data for keyword spotting it is sufficient.

Compile this code and flash it to your Pico.

Saving Off Serial Data

On Linux and macOS systems, saving serial data is relatively easy with the screen utility. You can install screen using your package manager in Linux, or using homebrew on macOS. Most Windows serial console clients also give you some means of saving off data from the serial console, but I won't provide instructions for these.

Before using screen, you'll first need to identify the device name of your Pico. On macOS, this will be something like /dev/tty.usbmodem..... On Linux, this will be something like /dev/ttyACM.... You can use lsusb and dmesg to help you figure out what device handle your Pico is.

With the code running on your Pico, the following command will open the Serial port and save off the data (make sure you replace /dev/... with your Pico's device handle): screen -L /dev/tty.usbmodem1301

Go ahead and start speaking one of your keywords for a period of time. Base-64 characters should constantly flash across your screen. Follow best practices listed in the Edge Impulse tutorial for creating datasets for keyword spotting as you collect your data.

To exit screen when you're done collecting data, type Control-A and then press k. You will now have the raw base-64 data saved in a file called screenlog.0.

You can rename this file to something that indicates what it actually contains (“{keyword}.raw” is a good choice), and keep collecting others.

Converting Base-64 Data to a .wav File

Edge Impulse doesn't know how to understand the data in base-64 format. What we need is a way to convert this base-64 data into a .wav file that you can import directly into Edge Impulse using its data uploader. I made a Python 3 program to do just that, which you can find here. This code requires the scipy, numpy, and pydub packages, which you can install via pip.

Replace infile with the path to the base-64 data output by screen, and outfile with the desired output path of your .wav file. You should be ready to run the Python program!

Once the program is done, you'll have a finished audio file. You can play this with any media player. Give it a try — if all goes well, you should be able to hear your voice!

Deploying a Machine Learning Model

You can now follow the rest of Edge Impulse's tutorial for training your model. Everything else, including all the tips, applies! Make sure you have enough training data, have a balanced dataset, use enough classes, etc.

For instructions on how to deploy a keyword spotting model on Pico, check out my project that uses Pico as the brains of a voice-activated lighting controller. In that project I use this code to collect data, and then show how to use the deployed model to control your projects!

Voice-Activated LED Strip - Raspberry Pi Pico

Build a voice-activated LED light strip controller on the cheap with a Raspberry Pi Pico.

Created By: Alex Wulff

Public Project Link: https://studio.edgeimpulse.com/public/117150/latest

Project Overview

LED light strips, sometimes called “Neopixels” in the maker community, are the perfect choice for interior lighting. They’re easy to install, inexpensive, and are completely programmable. Simply connect the light strips to a microcontroller, write some code to make whatever lighting pattern you want, and you’re good to go!

Unfortunately for me, however, I’m lazy. I have no desire to get up and change the pattern if I get bored or turn it off if I want to go to bed. I could buy some Wi-Fi connected light strip on Amazon, but these are expensive and fiddling with some app on my phone is not peak laziness. Some light strips come with remotes, but I know that I’d lose the remote within days of having it.

What if we could use the $4 Raspberry Pi Pico to control lighting strips with voice commands? With Edge Impulse, we can! With just the Pico, a cheap microphone, and a power supply, you can make a voice-controlled lighting system.

Why Raspberry Pi Pico?

The Pico is actually an excellent microcontroller for machine learning projects. First, it’s extremely inexpensive, so I don’t feel bad about leaving them inside a project instead of cannibalizing it when I’m done. Second, the Pico is an incredibly capable machine. It runs at a base frequency of 125 MHz so it can execute most machine learning pipelines in only a few hundred milliseconds, and it features a whole host of other peripherals like PIO and a second core. Third, writing code for Pico is an extremely pleasant experience. Pico supports normal C++, CMake, and Make, so development is very familiar. Actually programming the board itself is also as simple as possible.

Perhaps most important of all is that Pico is extremely well documented. It’s difficult to overstate how nice it is to have a well-document board; this saves countless hours of troubleshooting doing tasks as simple as setting up your development environment to experimenting with advanced features like PIO.

I firmly believe that the Raspberry Pi Pico offers the most bang for your buck out of all the Microcontrollers currently on the market. Additionally, with its new $6 Wi-Fi-equipped cousin, I can envision Pico becoming a formidable contender in a crowded field of microcontrollers.

Parts

You’ll only need a few components to make this project work:

Raspberry Pi Pico
Microphone with AGC
LED Light Strip
5V Power Supply
Breadboard

Most LED strips labelled as “addressable” or “Neopixel” will work. The main thing you’re looking for is a strip based on the WS2812 driver. You can find these on Amazon, Adafruit, Sparkfun, etc. in varying sizes and configurations.

Also ensure that you select a power supply that can deliver a suitable amount of current to your project. 60 LEDs can draw up to 2A at full brightness. This power supply should do the trick for most projects.

Step 1: Assembling the Circuit

Assembling this project should take less than 5 minutes on a breadboard. One important thing to note is that you should not tie the microphone to the 5V output from the power supply—instead, make sure it’s connected to the Pico’s 3V regulator. With the lights attached on, the 5V power supply is very noisy.

Step 2: Collecting Training Data

The Edge Impulse Studio only supports collecting data from Pico at a relatively low sample rate, so we need to use some custom code to collect training data. See this tutorial for information on how to collect data from the Pico’s AGC, upload it to Edge Impulse, and train a machine learning model on it.

You’ll need two keywords for this project. I used “start” and “stop”. When the lights are off, “start” will turn them on. When the lights are on and you say “start” again, the lights will cycle to a new lighting mode. Saying “stop” when the lights are on will turn them off. It took me a few tries to get my model working well, so don’t get discouraged if your system doesn’t work well at first!

Step 3: The Code

The code for this project really pushes the Raspberry Pi Pico to its limits. I use a variety of Pico-specific features, including direct memory access (DMA), PIO state machines, and multicore processing. First, we’ll start with the things you need to modify to make your project work. Start by cloning the awulff-pico-playground repository, and navigate to the pico-light-voice folder. This is the main folder for this project.

Things you need to modify

Check the indices of the keywords used in your model, and make sure those match with what the if (ix == ...) part of the code in main.cpp. Your “start” and “stop” keywords should match with what’s in the code.
Adjust thresh in main.cpp to control the sensitivity of the keyword detection.
Change NUM_LIGHTS in lights.cpp to the number of LEDs in your strip.

Now let’s take a quick tour through the code so you can learn how to modify it to fit your needs.

Folder Structure

The base of this code is built on top of Edge Impulse’s standalone inferencing starting point. You can follow the instructions here for setting up your development environment and downloading the Pico SDK. Additionally, once you’re finished with your voice model, copy over the folders from your deployed C++ library into the main project folder as described in this project’s README.

Main Sampling Code

Let’s start in the main code loop. You can find this in source/main.cpp.

Input Parameters

At the top of the file, you can find parameters for the model inputs and sample rates. A CLOCK_DIV of 12,000 gives a sample rate of 4 kHz. This is low for audio, but still intelligible to a machine learning model. This value needs to match the sample rate that you used to collect data in the data collection tutorial! We trained the model to respond to sample lengths of 4,000 samples (NSAMP), which works out to be exactly 1 second of data.

For more information on ADC sampling and DMA with Pico, check out my tutorial on this subject.

In this code I also do something called continuous inferencing, which can improve the performance of the keyword detection drastically by passing in overlapping buffers of samples into the model. All the math takes under 250ms to execute, so I maintain a sliding buffer of data and run the model on it every quarter of a second. Edge Impulse doesn’t support continuous inferencing natively on Pico, so I implemented a form of it in software.

Setup Code

All the code in main() before the while (true) portion gets executed once. This is all setup code used to configure the model and sampling.

Main Code Loop

We take advantage of Pico’s many hardware features to form a very efficient data processing pipeline, which you can find in the while (true) portion of the code. The basic operation of this is as follows:

The code starts the ADC sampling routine on Pico. This tells the ADC to start running, and fill up the memory location we passed to the function with samples. All of this happens in the background and does not hang the main loop, so while we’re sampling we can also be running the machine learning code.
As soon as the ADC setup is finished, we move samples from the raw sample buffer into the feature buffer for the model in a form the model expects (floats normalized to between 2^15 and -2^15). The very first time this function runs, the buffer will be filled with garbage (since the ADC hasn’t populated it yet), but every subsequent time this runs it will be filled with actual audio data.
Next we begin executing the machine learning routine on the populated feature buffer.
Based on the output from the model, we send a state update to the other core that handles the lighting. One keyword is used to turn the lights on and cycle through different lighting modes, and the other keyword will turn the lights off.
We now block and wait for the ADC sampling to finish: dma_channel_wait_for_finish_blocking(dma_chan)
As soon as sampling is done we move samples into an intermediate buffer. We want to minimize the amount of time here spent not sampling, so we don’t do any processing here and just leave that for while the ADC is busy.
We then loop and do it all again!

Continuous Inferencing Complications

The execution time of your model is very important for this application. As the code is configured, we run the model four times a second. The ADC is set to collect 1000 samples per shot (NSAMP); every time this collection is done, we shift samples out of the ADC buffer and into an intermediate buffer with past audio data.

The green LED on the Pico is configured to be on while the Pico is busy executing the ML code and off while the Pico is waiting for the ADC. Thus, you can use the duty cycle of the flashing to get a handle on how close your model is to the 250 ms inferencing limit. You are losing data if the light stays on continuously—if this is the case, try reducing the execution time of your model, or increase NSAMP so the model runs more often.

Lighting Code

We execute code to control the lighting on Pico’s second core while the first handles the sampling and machine learning. You can find the code for this in source/lights.cpp.

The second core executes a simple state machine based on input from the first core. Let’s look at the core1_entry() function to see how this works.

Setup Code

The first part of this code sets up the light strip. The Adafruit_NeoPixel library API should be the same as that which you find in other tutorials online, with some caveats. See the library’s GitHub page for more information.

Lighting Loop

The lights operate in a relatively simple manner. If we get the off keyword, the lights will stay off. When we get the “on” keyword, the lights will turn on in the previous lighting mode active when the lights were turned off. When we get the "on" keyword when the lights are already on, the code will cycle to a new lighting state. Add new lighting modes as you see fit, and make sure to update NUM_STATES to reflect how many states you’d like to use.

If your lighting code is relatively computationally intensive, make sure you periodically check for state updates (like I do in the rainbow mode) to ensure that your lighting system is responsive.

Multicore Communications

The update_state() function handles the communication between the two cores. Pico implements this communication using two FIFO queues — we can use this as a bi-directional pipe to send information back and forth between the cores. From the lighting core we tell the sampling core that we’re ready for data using multicore_fifo_push_blocking(0). If the sampling core sees that the lighting core is ready for a state update, and it has a state update to give, it will send this update to the lighting core. Once the lighting core receives an update it will change the state accordingly.

This code initially looks a little complicated, but it’s a relatively simple way to synchronize two independent processing cores. Read the Raspberry Pi documentation on FIFO and multicore processing if you’re stuck.

Step 4: Building and Flashing

You should be able to build this code using standard CMake tools. Follow the instructions here to compile the project using CMake and make, and flash it using the .uf2 file. If your code is failing to build, you might not have copied all the folders you needed from your deployed C++ Edge Impulse model, or the path to the Pico SDK might not be set correctly.

Troubleshooting and Modifying

Machine learning can be very tough. Any deviation from an environment similar to your training data can cause your model to stop working entirely. Here is a small selection of the many mistakes I made along the way while making this project:

Using a different sample rate in my training data vs. the actual data collected by this code
Accidentally passing in buffers of uint16_ts to the model when it was expecting floats
Consuming too much RAM on the Pico which causes it to silently break things instead of erroring out (keep NSAMP/INSIZE small!)
Not rate-limiting the FIFO buffers and causing them to fill up with stale state data
Switching the index of the keywords used in my model

Chances are, you’ll encounter some issues too. Spend some time to understand the code and you’ll have a much easier time debugging.

That’s all! I hope you enjoyed this tutorial. If you found it helpful, check out my writing, my website, and some of my other projects.

Snoring Detection on a Smart Phone

An audio classification project that can identify snoring, deployed to a smartphone.

Created By: Wamiq Raza

Public Project Link: https://studio.edgeimpulse.com/public/109559/latest

Story

Snoring, a type of sleep disordered breathing, disrupts sleep quality and quantity for both the snorer and, frequently, the person who sleeps with the snorer. Snoring-related sleep deprivation can cause serious physical, emotional, and financial difficulties. Snoring not only disrupts the snorer's sleep, but it can also cause anger between spouses! About 40% of adult males and 24% of adult women snore on a regular basis. Snoring begins when the muscles around the throat relax during sleeping. This narrows the airway, causing vibrations that result in snoring. Snoring is more common when a person sleeps on their back. Sleeping on your side is a natural remedy for snoring. Sleeping on your side rather than your back is a simple and natural treatment for snoring. In this project, a deep learning model for snoring detection is designed to be implemented on a smart phone using the Edge Impulse API, and the model may be put on other embedded systems to detect snoring automatically. A smart phone is linked to the listener module by home Wi-Fi or mobile data to log snoring incidents with timestamps, and the data may be shared to a physician for treatment and monitoring of disorders such as sleep apnea.

Getting started

This tutorial has the following requirements:

Basic understanding of software development
Edge Impulse account
Android or iOS mobile phone

We will use Edge Impulse, an online development platform for machine learning on edge devices. Create a free account by signing up here. Log into your account and give your new project a name by clicking on the title. To run the current project as-is, you can directly clone it to make your own copy, and start executing.

Data Acquisition

This project used a dataset of 1000 sound samples which is divided into two categories: snoring noises and non-snoring sounds. There are 500 examples in each class, the snoring noises were gathered from several web sources and non-snoring sound were gathered from similar web sources. The files were then separated into equal sized one second duration files after silences were removed from them. As a result, each sample lasts one second. Among the snoring samples, 363 samples consist of snoring sounds of children, adult men and adult women without any background sound [1]. The remaining samples having a background of non-snoring sounds. The 500 non-snoring samples consist of background sounds that are ten categories such as baby crying, the clock ticking, the door opened and closed, total silence and the minor sound of the vibration motor of the gadget, toilet flashing, siren of emergency vehicles, rain and thunderstorms, streetcar sounds, people talking, and background television news. Figure 1 and Figure 2 illustrate the frequency of snoring and non-snoring sound respectively. The dataset can be downloaded from [2].

Once the dataset is ready you can upload it into Edge Impulse. Figure 3 represents the Edge Impulse platform, and how to upload the data. If you don’t want to download or upload data, and just run the project, refer to the section Getting started for cloning the current project.

Creating Your Impulse

Next, we will select signal processing and machine learning blocks, on the Create impulse page. The impulse will start out blank, with Raw data and Output feature blocks. Leave the default settings of a 1000 ms Window size and 1000 ms Window increase as shown in Figure 4. This means our audio data will be processed 1 second at a time, starting each 1 second. Using a small window saves memory on the embedded device.

Click on ‘Add a processing block’ and select the Audio (MFE) block. Next click on ‘Add a learning block’ and select the Neural Network (Keras) block. Click on ‘Save Impulse’ illustrated in Figure 5. The audio block will extract a spectrogram for each window of audio, and the neural network block will be trained to classify the spectrogram as either a 'snoring' or 'no snoring' based on our training dataset. Your resulting impulse will look like this:

Next we will generate features from the training dataset on the MFE page, as shown in Figure 6. This page shows what the extracted spectrogram looks like for each 1 second window from any of the dataset samples. We can leave the parameters on their defaults.

Next click on the ‘Generate features’ button, which then processes the entire training dataset with this processing block. This creates the complete set of features that will be used to train our Neural Network in the next step. Press the ‘Generate features’ button to start the processing, this will take a couple minutes to complete, and the result of the generated features can be seen in Figure 7.

We can now proceed to setup and train our neural network on the NN Classifier page. The default neural network works well for continuous sound. Snoring detection is more complicated, so we will configure a richer network using 2D convolution across the spectrogram of each window. 2D convolution processes the audio spectrogram in a similar way to image classification. Refer to the "NN classifier" section in my project for the architecture structure.

Model Training and Results

To train the model, the number of epochs was set to 100, the leaning rate assigned after several trials is 0.005, and the overall dataset was split into 80% training and 20% validation set. The number of epochs is the number of times the entire dataset is passed through the neural network during training. There is no ideal number for this, and it depends on the data. In Figure 8 we can see the feature explorer for correct and incorrect classification of both classes.

The model confusion matrix and on mobile device performance can be seen in Figure 9. The overall accuracy of the quantized int8 model is 94.3%, with 93.5% 'no snoring' and '95.1%' snoring correct classification, and the rest are misclassified.

After the training model we run the model for testing purposes. It gives us an accuracy of 97.42% on test data, as presented in Figure 10, along with feature exploration.

Live Classification

The Live classification page allows us to test the algorithm both with the existing testing data that came with the dataset, or by streaming audio data from your mobile phone or on any microcontroller with audio data processing compatibility. We can start with a simple test by choosing any of the test samples and pressing ‘Load sample’. This will classify the test sample and show the results: in Figure 11 and 12 respectively for both classes with their probability score. We can also test the algorithm with live data. Start with your mobile phone by refreshing the browser page on your phone. Then select your device in the ‘Classify new data’ section and press ‘Start sampling’ as shown in Figure 15.

Deployment

In order to deploy the model on a smartphone, go to the browser window on your phone and refresh, then press the ‘Switch to classification mode’ button as shown in Figure 13. This will automatically build the project into a Web Assembly package and execute it on your phone continuously (no cloud required after that, you can even go to airplane mode!). Once you scan the QR code on your mobile it will ask you to access the microphone, as in this project we are using audio data, as shown in Figure 14. After the access is granted, we can see the classification results.

Furthermore, if you would like to extend the project, you can run it on different microcontrollers. Of course, for this project, our aim was to provide the insight of TinyML development on a smartphone.

Conclusion

This project provides insights of TinyML deployment on a smartphone. In this project, a deep learning model for snoring detection is trained, validated, and tested. A prototype system comprising of a listener module for snoring detection was demonstrated. For future work, extend the default dataset with your own data and background sounds, remember to retrain periodically, and test. You can set up unit tests under the Testing page to ensure that the model is still working as it is extended.

References

T. H. Khan, "A deep learning model for snoring detection and vibration notification using a smart wearable gadget," Electronics, vol. 8, no. 9, article. 987, ISSN 2079-9292, 2019
Khan, T. A Deep Learning Model for Snoring Detection and Vibration Notification Using a Smart Wearable Gadget. Electronics 2019, 8, 987. https://doi.org/10.3390/electronics8090987

Gunshot Audio Classification - Arduino Nano 33 + Portenta H7

A proof-of-concept machine learning project for first responders, to detect the sound of a gunshot.

Created By: Swapnil Verma

Public Project Link:

Intro Problem

On May 24, 2022, nineteen students and two teachers were fatally shot, and seventeen others were wounded at Robb Elementary School in Uvalde, Texas, United States[1]. An 18-year-old gunman entered the elementary school and started shooting kids and teachers with a semi-automatic rifle. The sad part is that it is not a one-off event. Gun violence including mass shootings is a real problem, especially in the USA.

What Can I Do About It?

Gun violence is a massive problem and I alone can not solve it, but I can definitely contribute an engineering solution toward hopefully minimizing casualties.

Here I am proposing a proof of concept to identify gun sounds using a low-cost system and inform emergency services as soon as possible. Using this system, emergency services can respond to a gun incident as quickly as possible thus hopefully minimizing casualties.

How Does It Work?

My low-cost proof of concept uses multiple microcontroller boards with microphones to capture sound. They use a TinyML algorithm prepared using Edge Impulse to detect any gunshot sound. Upon a positive detection, the system sends a notification to registered services via an MQTT broker.

To learn more about the working of the system, please check out the Software section.

Hardware

The hardware I am using is:

Arduino Portenta H7
Arduino Nano BLE Sense
9V Battery

In the current hardware iteration, the Arduino Nano BLE Sense is powered by a 9V battery and the Portenta H7 is powered via a laptop, because I am also using the serial port on the Portenta H7 to debug the system.

Software

The software is divided into 2 main modules:

Machine Learning
Communication

Machine Learning Module

The machine learning module uses a tinyML algorithm prepared using Edge Impulse. This module is responsible for identifying gunshot sounds. It takes sound as input and outputs its classification.

Dataset

Training

I trained the model for 5000 iterations with a 0.0005 learning rate. My final model has 94.5% accuracy.

Testing

Versioning

One nice feature Edge Impulse provides is versioning. You can version your project (like Git) to store all data, configuration, intermediate results and final models. You can revert back to earlier versions of your impulse by importing a revision into a new project. I use this feature every time before changing the neural network architecture. That way I don't have to retrain or keep a manual record of the previous architecture.

Deployment

Just select the type of library you want, and click the Build button at the bottom of the page. This will build a library and download it onto your computer. After downloading, it will also show a handy guide to include this library in your IDE.

The coolest part is that I don't need to retrain the model or do anything extra to deploy the same model onto multiple devices. The examples of the downloaded library will have example code for all the supported devices of the same family.

Just select an example as a getting started code, modify it according to your need and flash it. The Arduino Nano BLE Sense and Portenta H7 use the same model generated by Edge Impulse. I trained the model only once, agnostic of the hardware and deployed it on multiple devices which is a time saver.

Inferencing

Inferencing is the process of running a neural network model to generate output. The image below shows the inference pipeline.

The microphones in the Nano BLE Sense and Portenta H7 pick up the surrounding sound (stages 1 & 2). The sound data is then preprocessed using the MFCC block (stage 3). The preprocessed data is then sent to a Convolutional Neural Network block (stage 4) which classifies it into either the gunshot class or the other class (stage 5).

To learn more about the project please follow the below link.

The output of the machine learning module is then processed before sending it to the cloud via the Communication module.

Communication Module

This module is responsible for sharing information between boards and sending positive predictions to the registered emergency services.

To download the software please use the below link:

Testing

My testing setup and the result are illustrated in the video below. The system is connected to my laptop which is also performing the screen recording. On the upper left side, we have an Arduino serial window which is showing the output from the Portenta H7, and on the lower left hand, we have an audio player. On the right-hand side, we have cloudMQTT's WebSocket UI, which shows the incoming notification via MQTT. The sound for this video is played and recorded using my laptop's speaker and microphone.

In the video above I am playing different categories of sound and one of that categories is a gunshot. The system outputs the classification result to the Arduino serial port whenever it detects a sound from the other class but does not send it to the receiver. The moment it detects a gunshot sound, it immediately sends a notification to the receiver via CloudMQTT.

Code

References

[1] https://en.wikipedia.org/wiki/Robb_Elementary_School_shooting

[2] https://www.gunviolencearchive.org/

AI-Powered Patient Assistance - Arduino Nano 33 BLE Sense

Keyword recognition and notification for AI patient assistance With Edge Impulse and the Arduino Nano 33 BLE Sense.

Created By: Adam Milton-Barker

Public Project Link: https://studio.edgeimpulse.com/public/140923/latest

Project Demo

Project Repo

https://www.adammiltonbarker.com/projects/downloads/AI-Patient-Assistance.zip

Introduction

When hospitals are busy it may not always be possible for staff to be close when help is needed, especially if the hospital is short staffed. To ensure that patients are looked after promptly, hospital staff need a way to be alerted when a patient is in discomfort or needs attention from a doctor or nurse.

Solution

A well known field of Artificial Intelligence is voice recognition. These machine learning and deep learning models are trained to recognize phrases or keywords, and combined with the Internet of Things can create fully autonomous systems that require no human interaction to operate.

As technology has advanced, it is now possible to run voice recognition solutions on low cost, resource constrained devices. This not only reduces costs considerably, but also opens up more possibilities for innovation. The purpose of this project is to show how a machine learning model can be deployed to a low cost IoT device (Arduino Nano 33 BLE SENSE), and used to notify staff when a patient needs their help.

The device will be able to detect three keywords Doctor, Nurse, and Help. The device also acts as a BLE peripheral, BLE centrals/masters such as a central server for example, could connect and listen for data coming from the device. The server could then process the incoming data and send a message to hospital staff or sound an alarm.

Hardware

Arduino Nano 33 BLE Sense Buy

Platform

Edge Impulse Visit

Software

Project Setup

Head over to Edge Impulse and create your account or login. Once logged in you will be taken to the project selection/creation page.

Create New Project

Your first step is to create a new project. From the project selection/creation you can create a new project.

Enter a project name, select Developer and click Create new project.

We are going to be creating a voice recognition system, so now we need to select Audio as the project type.

Connect Your Device

You need to install the required dependencies that will allow you to connect your device to the Edge Impulse platform. This process is documented on the Edge Impulse Website and includes installing:

Once the dependencies are installed, connect your device to your computer and press the RESET button twice to enter into bootloader mode, the yellow LED should now be pulsating.

Now download the the latest Edge Impulse firmware and unzip it, then double click on the relevant script for your OS either flash_windows.bat, flash_mac.command or flash_linux.sh.

Once the firmware has been flashed you should see the output above, hit enter to close command prompt/terminal.

Open a new command prompt/terminal, and enter the following command:

edge-impulse-daemon

If you are already connected to an Edge Impulse project, use the following command:

edge-impulse-daemon --clean

Follow the instructions to log in to your Edge Impulse account.

Once complete head over to the devices tab of your project and you should see the connected device.

Data Acquisition

We are going to create our own dataset, using the built in microphone on the Arduino Nano 33 BLE Sense. We are going to collect data that will allow us to train a machine learning model that can detect the words/phrases Doctor, Nurse, and Help.

We will use the Record new data feature on Edge Impulse to record 15 sets of 10 utterances of each of our keywords, and then we will split them into individual samples.

Ensuring your device is connected to the Edge Impulse platform, head over to the Data Acquisition tab to continue.

In the Record new data, make sure you have selected your Arduino Nano 33 BLE Sense, then select Built in microphone, set the label as Doctor, change the sample length to 20000 (20 seconds), and leave all the other settings as.

Here we are going to record the data for the word Doctor. Make sure the microphone is close to you, click Start sampling and record yourself saying Doctor ten times.

You will now see the uploaded data in the Collected data window, next we need to split the data into ten individual samples.

Click on the dots to the right of the sample and click on Split sample, this will bring up the sample split tool. Here you can move the windows until each of your samples are safely in a window. You can fine tune the splits by dragging the windows until you are happy, then click on Split

You will see all of your samples now populated in the Collected data window. Now you need to repeat this action 14 more times for the Doctor class, resulting in 150 samples for the Doctor class. Once you have finished, repeat this for the remaining classes: Nurse and Help. You will end up with a dataset of 450 samples, 150 per class.

Now we have all of our main classes complete, but we still need a little more data. We need a Noise class that will help our model determine when nothing is being said, and we need an Unknown class, for things that our model may come up against that are not in the dataset.

For the noise class we will mix silent samples, and some other general noise samples. First of all record 100 samples with no speaking and store them in an Noise class.

Next download the Microsoft Scalable Noisy Speech Dataset and extract the data. Navigate to the Noise directory and copy 50 random samples. Next go to the Data Acquisition tab and upload the new data into the Noise class. Finally copy 100 samples from the unknown class and upload to the Edge Impulse platform as an Unknown class.

Split Dataset

We need to split the dataset into test and training samples. To do this head to the dashboard and scroll to the bottom of the page, then click on the Perform train/test split

Once you have done this, head back to the data acquisition tab and you will see that your data has been split.

Create Impulse

Now we are going to create our network and train our model.

Head to the Create Impulse tab and change the window size to 2000ms. Next click Add processing block and select Audio (MFCC), then click Add learning block and select Classification (Keras).

Now click Save impulse.

MFCC Block

Parameters

Head over to the MFCC tab and click on the Save parameters button to save the MFCC block parameters.

Generate Features

If you are not automatically redirected to the Generate features tab, click on the MFCC tab and then click on Generate features and finally click on the Generate features button.

Your data should be nicely clustered and there should be as little mixing of the classes as possible. You should inspect the clusters and look for any data that is clustered incorrectly (You don't need to worry so much about the noise and unknown classes being mixed). If you find any data out of place, you can relabel or remove it. If you make any changes click Generate features again.

Training

Now we are going to train our model. Click on the NN CLassifier tab then click Auto-balance dataset, Data augmentation and then Start training.

Once training has completed, you will see the results displayed at the bottom of the page. Here we see that we have 99.2% accuracy. Lets test our model and see how it works on our test data.

Testing

Platform Testing

Head over to the Model testing tab where you will see all of the unseen test data available. Click on the Classify all and sit back as we test our model.

You will see the output of the testing in the output window, and once testing is complete you will see the results. In our case we can see that we have achieved 96.62% accuracy on the unseen data, and a high F-Score on all classes.

On Device Testing

Now we need to test how the model works on our device. Use the Live classification feature to record some samples for classification. Your model should correctly identify the class for each sample.

Performance Calibration

Edge Impulse has a great new feature called Performance Calibration, or PerfCal. This feature allows you to run a test on your model and see how well it will perform in the real world. The system will create a set of post processing configurations for you to choose from. These configurations help to minimize either false activations or false rejections

Once you turn on perfcal, you will see a new tab in the menu called Performance calibration. Navigate to the perfcal page and you will be met with some configuration options.

Select the Noise class from the drop down, and check the Unknown class in the list of classes below, then click Run test and wait for the test to complete.

The system will provide a number of configs for you to choose from. Choose the one that best suits your needs and click Save selected config. This config will be deployed to your device once you download and install the library on your device.

Versioning

We can use the versioning feature to save a copy of the existing network. To do so head over to the Versioning tab and click on the Create first version button.

This will create a snapshot of your existing model that we can come back to at any time.

Deployment

Now we will deploy an Arduino library to our device that will allow us to run the model directly on our Arduino Nano 33 BLE Sense.

Head to the deployment tab and select Arduino Library then scroll to the bottom and click Build.

Note that the EON Compiler is selected by default which will reduce the amount of memory required for our model.

Once the library is built, you will be able to download it to a location of your choice.

Arduino IDE

Once you have downloaded the library, open up Arduino IDE, click Sketch -> Include library -> Upload .ZIP library..., navigate to the location of your library, upload it and then restart the IDE.

Non-Continuous Classification

Open the IDE again and go to File -> Examples, scroll to the bottom of the list, go to AI_Patient_Assistance_inferencing -> nano_ble33_sense -> nano_ble33_sense_microphone.

Download this project from here. Copy the contents of libraries/ai_patient_assistance/ai_patient_assistance.ino into the file and upload to your board. This may take some time.

Once the script is uploaded, open up serial monitor and you will see the output from the program. The green LED on your device will turn on when it is recording, and off when recording has ended.

Now you can test your program by saying any of the keywords when the green light is on. If a keyword is detected the red LED will turn on.

Continuous Classification

Now open AI_Patient_Assistance_inferencing -> nano_ble33_sense -> nano_ble33_sense_microphone_continuous, copy the contents of libraries/ai_patient_assistance/ai_patient_assistance_continuous.ino into the file and upload to your board.

Once the script is uploaded, open up serial monitor and you will see the output from the program. The red LED will blink when a classification is made.

BLE

This program acts as a BLE peripheral which basically advertises itself and waits for a central to connect to it before pushing data to it. In this case our central/master is a smart phone, but in the real world this would be a BLE enabled server that would be able to interact with a database, send SMS, or forward messages to other devices/applications using a machine to machine communication protocol such as MQTT.

You can use a free BLE app such as nRF Connect desktop or nRF Connect Mobile to connect to your device and read the data published by it.

When your BLE app connects to the program, the LED light will turn blue, once the app disconnects the LED will turn off.

Conclusion

Here we have created a simple but effective solution for detecting specific keywords that can be part of a larger automated patient assistance system. Using a fairly small dataset we have shown how the Edge Impulse platform is a useful tool in quickly creating and deploying deep learning models on edge devices.

You can train a network with your own keywords, or build off the model and training data provided in this tutorial. Ways to further improve the existing model could be:

Record more samples for training
Record samples from multiple people

Acoustic Pipe Leakage Detection - Arduino Portenta H7

Using an Arduino Portenta H7 to listen for and classify the flow of water in a pipe.

Created By: Manivannan Sivan

Public Project Link: https://studio.edgeimpulse.com/public/111978/latest

Project Demo

Impact of Leakages in Pipes

Water is the world's most precious resource, yet it is also the one which is almost universally mismanaged. As a result, water shortages are becoming ever more common. In the case of water supply and distribution networks, these manifest themselves in the intermittent operation of the system. Not only is this detrimental to the structural condition of the pipes, but can also adversely affect the quality of the water delivered to the customer's taps. Further, leakage often exceeds 50% of the production. Not only does this have a significant economic impact, but an environmental one too. But to recover leakage has a cost to undertake a hydraulic study of the network, create a permanent monitoring system, and eliminate the leaks. So how low should leakage go and how can a lower leakage level be maintained over time? This is the objective of the very innovative EU funded PALM project recently completed in central Italy.

Increase in Carbon Level Due to Water Leakage

There is an increased carbon footprint of having pumps constantly running to make up for the water lost due to leakage. It is the increased pump use, and pump maintenance/replacement costs that increase CO2 in the air from the fossil fuels being burned to support it. According to a study done by Von Sacken in 2001, water utilities are the largest user of electricity accounting for 3% of the total electricity consumption in the US. In addition, it is estimated that 2-3 billion kW/h of electricity is expended pumping water due to leakage.

Costs, health, the environment, and infrastructure are just a few things that can come into play when water system leakage goes uncorrected.

More than 2 billion people globally live in countries with high water stress, per the 2018 statistics provided by the United Nations (UN). In order to tackle this problem, it is necessary to conserve and utilize water safely. Installation of proper water pipeline leak detection systems assist in specifying the leakages in installed water pipes, which ultimately avoids wasting water through cracks and holes. Therefore, the increasing scarcity of water is propelling the demand for water leak solutions, which in turn drives the market.

Global Market for Pipe Leakage Detection Systems

The global water pipeline leak detection systems market size is expected to reach $2,349.6 million in 2027, from $1,748.6 million in 2019, growing at a CAGR of 6.8% from 2020 to 2027. Water pipeline leak detection systems are utilized to determine the location of the leak in water transmission pipelines. Around 30% to 50% of water is lost through aging pipelines, which also contributes toward loss of revenue. Water pipeline leak detection systems are available for both underground and overground water pipelines to precisely locate and check the severity of pipeline leaks.

On the contrary, in recent years, pipeline leak detection systems have undergone various technological advancements by adoption of computerized systems and digital survey systems. The traditional acoustic detection sensors are upgraded with more efficient sound detection functions which has increased their efficiency. Introduction and implementation of such advanced technologies are likely to create lucrative opportunities for the growth of the water pipeline leak detection systems market during the forecast period.

In recent years, the increase in acoustic-based pipe leakage detection has started increasing due to investment in R&D.

A TinyML-based Solution for Pipe Leakage Detection:

In this fast growing sector, TinyML-based systems will play a major role due to low power consumption and developing EdgeML models with more accuracy in predicting leakage detection.

My prototype is based on acoustic data collected on an Arduino Portenta H7 and a model is trained using Edge Impulse. In my prototype, The Arduino Portenta Vision Shield is used because it contains two microphones (MP34DT05) which runs on 16 MHz. The Vision Shield is placed on top of the pipe for data acquisition as the microphone faces the pipe. This will help to collect the noise of the water flowing.

In the data acquisition stage, the pipe is simulated with "Idle" mode, where the tap is fully closed so no water flows, and then slightly opened to simulate "leakage mode". Finally, is it fully opened to simulate "water flow" mode.

Pre-Processing

In a pre processing stage, the Window size is set as 2000ms and Window increase is set as 500ms.

For Neural Network configuration, I have used couple of 1D-Conv layers followed by DNN layers.

The number of Training cycles is set to 100 and Learning rate is set to 0.005. The accuracy obtained was 99.1 % with loss of 0.02 only. As the model is performing well at classifying the data, we can move on.

Model Testing

In Model testing, the trained model is tested with data and it is able to predict all 3 conditions we trained on with 100% accuracy.

Deployment

For initial setup of the Portenta, follow the steps outlined here: https://docs.edgeimpulse.com/docs/development-platforms/officially-supported-mcu-targets/arduino-portenta-h7

Then in Deployment section, select Arduino Portenta H7 and download the firmware files to your computer.

Press the Reset button twice on the Portenta to change it to Flash mode. Then run the .bat file if you are on Windows, or the Mac or Linux commands if you are on those platforms.

Summary

The prototype demonstrated an acoustic method to predict leakage in a pipe. The model was able to determine whether the pipe is in Idle (no water flowing), flowing normally, or if there is a small flow, representing leakage in this case. The use case is simple enough to apply to any industry to monitor the leakages in pipe, though this is of course only a prototype project.

By integrating well-designed enclosures with higher quality microphones, the Arduino Portenta H7 will be ideal for industrial use-cases for pipe leakage detection.

Environmental Noise Classification - Nordic Thingy:53

A sample project demonstrating how to use the Nordic Thingy:53 and the Edge Impulse App to perform environmental noise classification.

Created By: Attila Tokes

Public Project Link:

Project Demo

Intro

Noise pollution can be a significant problem especially in densely populated urban areas. It can have negative effects both humans and the wildlife. Also, noise pollution is often caused by power hungry activities, such as industrial processes, constructions, flights, etc.

A Noise Pollution Monitoring device built on top of the Nordic Thingy:53 development kit, with smart classification capabilities using Edge Impulse can be a good way to monitor this phenomenon in urban areas. Using a set of Noise Pollution Monitoring the noise / environmental pollution from a city can be monitored. Based on the measured data, actions can be taken to improve the situation. Activities causing noise pollution tend to also have a high energy consumption. Replacing this applications with more efficient solutions can reduce their energy footprint they have.

In this project I will demonstrate how a low power device like the Nordic Thingy:53 can be used in conjunction with an edge machine learning platform like Edge Impulse to build a smart noise / environmental pollution monitor. The PDM microphone of the Nordic Thingy:53 will be used to capture environmental noise. A digital signal processing (DSP) and Inference pipeline built using Edge Impulse will be used to classify audio samples of know activities like construction works, traffic and others.

Getting Started with the Thingy:53

The Nordic Thingy:53 is comes with the pre-installed firmware, that allows us to easily create machine learning projects with Edge Impulse.

To get started with the app we will need to create an Edge Impulse account, and a project:

After this we should be able to detect the Thingy:53 in the Devices tab. The thingy will show up as a device named EdgeImpulse.

Going to the Inference tab we can try out the pre-installed demo app, which uses the accelerometer data to detect 4 types of movement.

Collecting Audio Data

The first step of building a machine learning model is to collect some training data.

For this proof-of-concept, I decided to go with 4 classes of sounds:

Silence - a silent room
Nature - sound of birds, rain, etc.
Construction - sounds from a construction site
Traffic - urban traffic sounds

A the source of the audio samples I used a number of Youtube videos, listed in the Resources section.

The audio sample can be collected from the Data tab of the nRF Edge Impulse app:

The audio samples are automatically uploaded to the Edge Impulse Studio project, and should show up in the Data Acquisition tab:

By default all the samples will be put in the Train set. We also need a couple of samples for verification, so we will need to run a Train / Test split:

After this we should have approximately 80% of samples in the Train set, and 20% in the Test set:

Training an Audio Classification Model

Having the audio data, we can start building a machine learning model.

In Edge Impulse project the machine learning pipeline is called an Impulse. An impulse includes the pre-processing, feature extraction and inference steps needed to classify, in our case, audio data.

For this project I went will the following design:

Time Series Data Input - with 1 second windows @ 16kHz
Audio (MFE) Extractor - this is the recommended feature extractor for non-voice audio
NN / Keras Classifier - a neural network classifier
Output with 4 Classes - Silence, Nature, Traffic, Construction

The impulse blocks were trained mostly with the default settings. The feature extraction block looks like follows:

This is followed by the classification block:

The resulting model is surprisingly good:

Most of the test samples were correctly classified. We only have a couple of mismatches for the Traffic / Construction and Silence / Nature classes. This is however expected, as these sounds can be pretty similar.

Deploying the Model on the Thingy:53

Building an deploying an embedded application including machine learning used to involve a couple of steps. With the Thingy:53 and Edge Impulse this become much easier.

We just need to go to the Deployment tab, and hit Deploy. The model will automatically start building:

A couple of minutes later the model is built and deployed on our Thingy:53:

Running Live Inference on the Thingy:53

The Deployment we did earlier should have been uploaded a firmware with the new model on the Thingy:53. Hitting Start on the Inference will start live classification on the device.

I tested the application out with new audio samples for each class:

A Network of Devices

In future versions this project could be extended to also include features like:

Noise level / decibel measurement
Cloud connectivity via Bluetooth Mesh / Thread
Solar panel charging

A network of such monitoring devices could be used to monitor the noise / environmental pollution in a city. Based on the collected data high impact / polluting activities can be identified, and can be replaced with better alternatives.

Resources

Sound Sources:

Vandalism Detection via Audio Classification - Arduino Nano 33 BLE Sense

Audio classification with an Arduino Nano 33 BLE Sense to detect the sound of glass breaking.

Created By: Nekhil R.

Public Project Link:

GitHub Repository:

Project Demo

Story

The direct annual cost of vandalism runs in the billions of dollars annually in the United States alone. Breaking glass and defacing property are some of the serious forms of vandalism. Conventional security techniques such as direct lighting and intruder alarms can be ineffective in so locations and cases, so here we explore another form of prevention. In this project, we are able to detect the sound of glass breaking, and can alert a user instantly about the event.

In this project, we only focus on glass breaking, however, this project can be applied to any other form of vandalism that also produces a unique sound.

How Does It Work

The device will work as follows. Suppose a vandal tried to break glass, which will of course have a unique sound. The tinyML model running on the device can recognize the event using a microphone. Then the device will send email notifications to a registered user regarding the audio detection.

Hardware

Arduino Nano 33 BLE Sense

For this project we are using an Arduino Nano 33 BLE Sense. It's a 3.3V AI-enabled board in a very small form factor. It comes with a series of embedded sensors including an MP34DT05 Digital Microphone.

ESP-01

The ESP-01 is used for adding WiFi capability to the device, because the Arduino Nano 33 BLE Sense does not have any native WiFi capability. The WiFi is specifically used for sending email alerts. Serial communication is established between the Arduino and ESP-01, for transmitting the email.

Software

Data Acquisition

In this scenario, we have only two classes Glass Break, and Noise. Glass breaking sounds that we have used are from the vivid online resources and the major noise datasets are from the Microsoft Scalable Noisy Speech Dataset (MS-SNSD). We also included the natural noise in the room, apart from the MS-SNSD data.

The sound recording was done for 20 seconds at a 16KHz sampling rate. Something to keep in mind is that you must keep the sampling rate the same between your training dataset and your deployment device. If you are training with 44.1Khz sound, you need to downsample it to 16KHz when you are ready to deploy to the Arduino.

We collected around 10 minutes of data and split it between Training and a Test set. In the Training data we split the samples to 2s, otherwise the inferencing will fail because the BLE Sense has a limited amount of memory to handle the data.

Impulse Design

This is our Impulse, which is the machine learning pipeline termed by Edge Impulse.

Here we used MFE as the processing block, because it is very suitable for non-human voices. We have used the default parameters of the MFE block.

Neural Network

These are our Neural Network settings, which we found most suitable for our data. If you are tinkering with your own dataset, you might need to change these parameters a bit, and some exploration and testing could be required.

We enabled the Data augmentation feature, which helps us to randomly transform data during training. This we are able to run more training cycles without overfitting the data, and also helps improve accuracy.

This is our Neural Network architecture.

We have used the default 1D convolutional layer, then we trained the model. We ended up with 97% accuracy, which is very awesome. By looking at the confusion matrix it is clear that there is no sign of underfitting and overfitting.

Model Testing

Before deploying the model, it's a good practice to run the inference on the Test dataset that was set aside earlier. In the Model Testing, we got around 92% accuracy.

Let's look into some of the misclassifications, to better understand what is happening. In this case, the noise very well resembled the Glass Break sound, which is why it is misclassified:

In this next case, the model performed very well in classifying the data, although the data contains both the Glass_Break and some noise. The majority of the data was noise however, that's why its misclassified.

In these two cases shown below, again Noise was the major reason for misclassification:

Overall though, the model is performing very well and can be deployed in the real world.

Deployment

For deploying the Impulse to the BLE Sense, we exported the model as an Arduino library from the Studio.

IFTT

Here is the application I have created:

Case

All the components were fit inside this case, to make a tidy package:

Real World Testing

Here are the results of some live testing, after the model is deployed to the device. You can check it out in the below video. The sound of the glass breaking is played on a speaker, and you can see the results of the inferencing and email being sent.

Predictive Maintenance Using Audio Classification - Arduino Nano 33 BLE Sense

A proof-of-concept that uses an Arduino to listen for anomalies in the sound of a running motor.

Created By: Shebin Jose Jacob

Public Project Link:

Intro

Every manufacturing environment is equipped with machines. For a better-performing manufacturing unit, the health of machines plays a major role and hence maintenance of the machines is important. We have three strategies of maintenance namely - Preventive maintenance, Corrective maintenance, and Predictive maintenance.

If you want to find the best balance between preventing failures and avoiding over-maintenance, Predictive Maintenance (PdM) is the way to go. Equip your factory with relatively affordable sensors to track temperature, vibrations, and motion data, use predictive techniques to schedule maintenance when a failure is about to occur, and you'll see a nice reduction in operating costs.

In the newest era of technology, teaching computers to make sense of the acoustic world is now a hot research topic. So in this project, we use sound to do some predictive maintenance using an Arduino Nano 33 BLE Sense.

How Does It Work?

We use the Nano 33 BLE Sense to listen to the machine continuously. The MCU runs an ML model which is trained on two sets of acoustic anomalies and a normal operation mode. When the ML model identifies an anomaly, the operator is immediately notified and the machine may be shut down for maintenance after proper inspection. Thus, we can reduce the possible damage caused and can reduce the downtime.

Hardware Requirements

Nano 33 BLE Sense
LED

Software Requirements

Edge Impulse
Arduino IDE

Hardware Setup

The hardware setup consists of a Nano 33 BLE Sense, which is placed beside an old AC motor.

Software Setup

TinyML Model Generation

1. Data Collection

Clean data is the most important requirement to train a well-performing model. In our case, we have collected 3 classes of sound - two classes of anomalies, one normal operation class, and a noise class. Each sample is 2 seconds long. The raw data of these classes is visualised below.

If the data is not split into training and testing datasets, split the dataset into training and testing datasets in the ratio 80:20, which forms a good dataset for model training.

2. Impulse Design

An Impulse is the machine learning pipeline that takes raw data, uses signal processing to extract features, and then uses a learning block to classify new data.

Here we are using the Time Series data as the input block. Now, we have two choices for the processing block - MFCC and MFE. As we are dealing with non-vocal audio and MFE performs well with non-vocal audio, we have chosen MFE as our processing block. We have used Classification as our learning block since we have to learn patterns and apply them to new data to categorize the audio into one of the given 4 classes.

In the MFE tab, you can tweak the parameters if you're good with audio handling, else leave the settings as it is and generate features.

3. Model Training and Testing

Now that we have our Impulse designed, let's proceed to train the model. The settings we employed for model training are depicted in the picture. You can play about with the model training settings so that the trained model exhibits a higher level of accuracy, but be cautious of overfitting.

A whopping 94.7% accuracy is achieved by the trained model.

Let's now use some unknown data to test the model's functionality. To assess the model's performance, move on to Model Testing and Classify All.

We have got 95.07% testing accuracy, which is pretty awesome. Now let's test the model with some real-world data. Navigate to Live Classification and collect some data from the connected device.

We have collected some real-world data of Normal Operation Mode, Anomaly 1, and Anomaly 2 respectively, and all of them are correctly classified. So our model is ready for deployment.

Deployment

For deployment, navigate to the Deployment tab, select Arduino Library and build the library. It will output a zip library, which can be added to Arduino IDE.

Final Product

Nano 33 BLE Sense along with an LED is enclosed in a 3D printed case, which is our final product. The device is capable of identifying acoustic anomalies in a machine and alerts the user using the alert LED.

Porting an Audio Project from the SiLabs Thunderboard Sense 2 to xG24

Take an existing audio classification model built for the Thunderboard Sense 2, and prepare it for use on the SiLabs xG24 board.

Created By: Pratyush Mallick

Public Project:

Intro

This project focuses on how to port an existing audio recognition project built with a SiLabs Thunderboard Sense 2, to the latest as used in the newer SiLabs xG24 Dev Kit. For demonstration purposes, we will be porting project, which is an Edge Impulse based TinyML model to predict various vehicle failures like faulty drive shaft and brake-pad noises. Check out his work for more information.

The audio sensor on the Thunderboard Sense 2 and the xG24 Dev Kit are the same (TDK InvenSense ICS-43434), so ideally we're not required to collect any new data using the xG24 Dev Kit for the model to work properly. Had the audio sensor been a different model, it would most likely be necessary to capture a new dataset.

However, note has to be taken that the xG24 has two microphones, phones placed at the edges of the board.

In this project, I am going to walk you through how you can clone Mani's Public Edge Impulse project for the Thunderboard Sense 2 board, build it for the xG24, test it out, and then deploy to the newer SiLabs xG24 device instead.

Installing Dependencies

Before you proceed further, there are few software packages you need to install.

Clone And Build

Click on the "Clone" button at top-right corner of the page.

That will bring you to the below popup tab. Enter a name for your clone project, and click on the "Clone project" button.

This action will copy all the collected data, generated features, and model parameters into your own Edge Impulse Studio. You can verify this by looking at the project name you entered earlier.

Now if you navigate to "Create impulse" from the left menu, you will see how the model was created originally.

As you can see, the model was created based on audio data sampled at 16KhZ. As mentioned, because the audio microphones used on the both board boards are the same, we're not required to collect any additional data from the new board.

However, if you do want to collect some data from the xG24, then you will need to flash the base firmware and then use the edge-impulse-daemon to connect the device to the Studio.

You can follow the guide below to go through the process, if you are interested in adding more data samples to your cloned project:

With default value of Window Size (10s) and Window Increase (500 ms), the processing block will throw an error, as represented below:

This is because some of the features in Edge Impulse's processing block have been updated since this project was created, so you need to update some of the parameters in the Timer Series block such as Window Size and Window Increase, or increase the frame stride parameter in the MFE processing block. This is what my updated window parameters look like:

Next, navigate to the "Classification" tab from the left menu, and click on "Start training".

Alternatively, you can also collect more data as mentioned above, or add new recognized sounds with other audio classes, then begin your training.

Test

When you are done training, navigate to the "Live Classification" page from the left menu. This feature of Edge Impulse comes in handy when migrating projects to different boards.

Rather than deploying the model and then testing it on the hardware, with this feature we can actually collect audio data from the hardware immediately, and run the model in the Studio on the collected data. This saves time and effort before hand.

For Edge Impulse supported boards we can directly download the base Edge Impulse firmware, and then directly record audio (or other) data from the target device.

Once done, you can select the device name, select the sensor as "Microphone", sample length and the sampling frequency (ideally equally to collected samples).

Deploy

When you are done retraining, navigate to the "Deployment" tab from the left menu, select "SiLabs xG24 Dev Kit" under "Build firmware", then click on the "Build" button at the bottom of the page.

This will build your model and download a .zip file containing a .hex file and instructions.

With the Thunderboard Sense 2 deploying firmware could be done by directly dragging and dropping files to the "USB Driver TB004" when the device was connected in flash mode to a host PC. However, for the xG24 we have to use Simplicity Commander to upload the firmware to the board. You need to first connect the xG24 board the PC, make note of the COM port for the board, and ideally it will be identified by the PC as a J-Link UART port.

Now open the Simplicity Commander tool and connect the board. Once connected, select the "Flash" option on the left and then select the downloaded .hex file and flash it to the board.

To start the inferencing run the following command in your terminal:

edge-impulse-run-impulse

Note that this is a newer command supported by the Edge Impulse CLI, hence you may need to update your edge-impulse-cli version to get this running and avoid a package mismatch as shown below:

Now your model should be running, and recognize the same audio data and perform inferencing on the newer xG24 Dev Kit hardware, with little to no modifications to actual data or to the model architecture.

This highlights the platform agnostic nature of Edge Impulse, and was possible in this case because the audio sensor on both the Thunderboard and xG24 are the same. However, you would need do your own due diligence for migrating projects built with other sensor data such as humidity/temperature, or the light sensor, as those do vary between the boards.

One final note is that in this project, the xG24 is roughly 2x as fast as the Thunderboard Sense 2 in running the DSP, and 8x faster in running the inference:

Hopefully this makes upgrading your SiLabs projects easier!

Environmental Audio Monitoring Wearable - Syntiant TinyML - Part 1

A prototype smart device that uses a Syntiant TinyML board to alert the wearer with haptic feedback if emergency vehicles or car horns are detected.

Created By: Solomon Githu

Public Project Link:

Introduction

An electronic device that is intended to function on a user's body is considered wearable technology. The largest categories of wearables are smartwatches and hearables, which have experienced the fastest growth in the recent years. Steve Roddy, former Vice President of Product Marketing for Arm's Machine Learning Group once said that "TinyML deployments are powering a huge growth in ML deployment, greatly accelerating the use of ML in all manner of devices and making those devices better, smarter, and more responsive to human interaction". TinyML enables running Machine Learning on resource constrained devices like wearables.

Sound classification is one of the most widely used applications of Machine Learning. A new use case for wearables is an environmental audio monitor for individuals with hearing disabilities. This is a wearable device that has a computer which can listen to the environment sounds and classify them. In this project, I focused on giving tactile feedback when vehicle sounds are detected. The Machine Learning model can detect ambulance and firetruck sirens as well as cars honking. When these vehicles are detected, the device then gives a vibration pulse which can be felt by the person wearing the device. This use case can be revolutionary for people who have hearing problems and even deaf people. To keep people safe from being injured, the device can inform them when there is a car, ambulance or firetruck nearby so that they can identify it and move out of the way.

Dataset Preparation

In addition to the key events that I wanted to be detected, I also needed another class that is not part of them. I labelled this class as "unknown" and it has sounds of traffic, people speaking, machines, and vehicles, among others. Each class has 1 second of audio sounds.

Impulse Design

The Impulse design was very unique as I was targeting the Syntiant TinyML board. Under "Create Impulse" I set the following configurations:

Under "Classifier" I set the number of training cycle as 100 with a learning rate of 0.0005. Edge Impulse automatically designs a default Neural Network architecture that works very well without requiring the parameters to be changed. However, if you wish to update some parameters, Data Augmentation can improve your model accuracy. Try adding noise, masking time and frequency bands and inspect your model performance with each setting.

I then clicked “Start training” and waited for a few minutes for the training to be complete. Upon completion of the training process, I got an accuracy of 97.6%, which is pretty good!

Model Testing

On the left bar, we click "Model testing" then "Classify all". The current model has a performance of 97.8% which is pretty good and acceptable.

From the test data, we can see the first sample has a length of 3 seconds. I recorded this in a living room which had a computer playing siren sounds and at the same time a television was playing a movie. In each timestamp of 1 second, we can see that the model was able to predict the ambulance_firetruck class. I took this as an acceptable performance of the model and proceeded to deploy it to the Syntiant TinyML board.

Deploying to Syntiant TinyML Board

To deploy our model to the Syntiant Board, first click "Deployment" on the left side panel. Here, we will first deploy our model as a firmware for the board. When our audible events: ambulance_firetruck and car_horn are detected, the onboard RGB LED will turn on. When the "unknown" sounds are detected, the onboard RGB LED will be off. This firmware runs locally on the board without requiring internet connectivity and also with minimal power consumption.

Under "Build Firmware" select Syntiant TinyML.

Under "Configure posterior parameters" click "Find posterior parameters". Check all classes apart from "unknown", and for calibration dataset we use "No calibration (fastest)". After setting the configurations, click "Find parameters".

This will start a new task, so we have to wait until it is finished.

When the job is completed, close the popup window and then click "Build" option to build our firmware. The firmware will be downloaded automatically when the build job completes.

Once the firmware is downloaded, we first need to unzip it. Connect a Syntiant TinyML board to your computer using a USB cable. Next, open the unzipped folder and run the flashing script based on your Operating System.

We can connect to the board's firmware over Serial. To do this, open a terminal, select the COM Port of the Syntiant TinyML board with settings 115200 8-N-1 settings (in Arduino IDE, that is 115200 baud Carriage return). Sounds of ambulance sirens, firetruck sirens, and cars horns will turn the RGB LED red.

For the "unknown" sounds, the RGB LED is off. In configuring the posterior parameters, the detected classes that we selected are the ones which trigger the RGB LED lighting.

A Smart Watch-out

After testing the model on the Syntiant TinyML board and finding that it works great, I proceeded to create a demo of the smart wearable of this project.

This involved connecting a vibration motor to GPIO 1 of the Syntiant TinyML board. When the classes "ambulance_firetruck" and "car_horn" are detected, the GPIO 1 on the board is set HIGH and this causes the vibration motor to vibrate for 1500 milliseconds. Vibration motors are mostly used to give haptic feedback in mobile phones and video game controllers. They are the components that make your phone vibrate.

Since we cannot connect a motor directly to the GPIO pins, I used the 5V pad on the Syntiant TinyML board to power the vibration motor through a transistor that is switched by GPIO 1.

In future, we can then package these components safely into a wrist wearable. The Syntiant TinyML board has a 3.7V LiPo battery connector which will enable the wearable to be used anywhere. For this demo, I used the USB connector as the power source for all components.

The image below shows the annotation of the Syntiant TinyML board. GPIO 1, GND and the 5V pad on the bottom side are used for this smart wearable.

Conclusion

This environmental audio monitor wearable is one of the many solutions that TinyML offers. A future work can be to include other sounds such as motor bikes, detect construction equipment, or alarm sounds, among others.

With Edge Impulse, developing ML models and deploying them has always been easy. The Syntiant TinyML board was chosen to deploy our model because it provides ultra-low power consumption, a fully connected neural network architecture, an onboard microphone, its tiny size, and is also fully supported by Edge Impulse.

Enhancing Worker Safety using Synthetic Audio to Create a Dog Bark Classifier

Building an audio classification wearable that can differentiate between the sound of a dog bark, howl, and environmental noise, trained entirely on synthetic data from ElevenLabs.

Created By: Solomon Githu

Public Project Link: https://studio.edgeimpulse.com/public/497492/latest

GitHub Repository: https://github.com/SolomonGithu/tinyml_dog_bark_and_howl_classification

Introduction

It's said that a dog is man's best friend and it is no secret that dogs are incredibly loyal animals. They are very effective when it comes to securing premises, and are also able to sense when things are not right, whether with a person or with a situation. Some examples of dog security are guidance for people with visual impairments, detection of explosives and drugs, search and rescue missions, and enhancing security at various places. Worker safety aims to foster a practice of ensuring a safe working environment by providing safe equipment and implementing safety guidelines that enable workers to be productive and efficient in their job. In this case, dogs are usually deployed to patrol and monitor areas around workplace. One of the reasons is because dogs have an extraordinary sensing ability of smell, vision and hearing; making them exceptional at detecting threats that may go unnoticed by humans or other security systems. However, workers may not always be able to interpret a dog's barks in time. The workers may not be knowledgeable of how dog's react, or they may be focusing on their tasks and fail to hear a dog. Failure to detect a dog's bark may lead to fatalities, injuries or even accidents.

Machine listening refers to the ability of computers to understand audio signals similarly to how humans hear and understand various sounds. Recently, labeling of acoustic events has emerged as an active topic covering a wide range of applications. This is because by analyzing animal sounds, AI can identify species more accurately and efficiently than ever before and provide unique insights into the behaviors and habitats of animals without disturbing them. Barking and other dog vocalizations have acoustic properties related to their emotions, physiological reactions, attitudes, or some other internal states. Historically, humans have relied on intuition and experience to interpret these signals from dogs. We have learned that a low growl often precedes aggression, while a high-pitched bark might indicate excitement or distress. Through this experience, we can train AI models to recognize dog sounds, and those who work with the animals— like security guards, maintenance staff, and even delivery people can use that insight.

The AI model only requires to be trained to recognize the sounds one seeks to monitor based on recordings of the sound. However, creating an audio dataset of animal sounds is quite challenging. In this case, we do not disturb dogs, or other animals, to provoke reactions like barking. Fortunately, Generative AI is currently at the forefront of AI technology. Over the past decade, we have witnessed significant advancements in synthetic audio generation. From sounds to songs, with just a simple prompt, we can now use computers to generate dog sounds and in turn use the data to train another AI model.

Project Overview

This project aims to develop a smart prototype wearable that can be used by workers to receive alerts from security dogs. In workplaces and even residential areas, dog sounds are common, but we often overlook them, assuming there is no real threat. We hear the sounds but don't actively listen to the warnings dogs may be giving. Additionally, workers at a site may be too far to hear the dogs, and in some cases, protective ear muffs further block out environmental sounds.

Sound classification is one of the most widely used applications of Machine Learning. This project involves developing a smart wearable device that is able to detect dogs sounds specifically barking and howling. When these dogs sounds are detected, the wearable communicates about the dog's state by displaying a message on a screen. This wearable can be useful to workers by alerting them of precautionary measures. A security worker may be alerted of a potential threat that a dog identified but they did not manage to see. A postal delivery person can also be alerted of an aggressive dog that may be intending to attack them as they may perceive the delivery person as a threat.

To train a Machine Learning model for this task, the project uses generative AI for synthetic data creation. The reason why I chose this is because we cannot distress a dog so that we can obtain reactions like barking or howling. I also wanted to explore how generative AI can be used for synthetic data generation. Ideally, when training Machine Learning models, we want the data to be a good representation of how it would also look when the model is deployed (inference).

With the recent advancements in embedded systems and the Internet of Things (IoT), there is a growing potential to integrate Machine Learning models on resource constrained devices. In our case, we want a light-weight device that we can easily wear on our wrists and at the same time achieving smart acoustic sensing. Steve Roddy, former Vice President of Product Marketing for Arm's Machine Learning Group once said that "TinyML deployments are powering a huge growth in ML deployment, greatly accelerating the use of ML in all manner of devices and making those devices better, smarter, and more responsive to human interaction". Tiny Machine Learning (TinyML) enables running Machine Learning on small, low cost, low-power resource constrained devices like wearables. Many people have not heard of TinyML but we are using it everyday on devices such as smart home assistants. According to an article by Thoughtworks, Inc., there are already 3 billion devices that are able to run Machine Learning models.

We will use TinyML to deploy a sound classification model on the Seeed Studio XIAO ESP32S3 (Sense). This tiny 21mm x 17.8mm development board integrates a camera sensor, digital microphone, SD card, 8MB PSRAM and 8MB Flash. By combining embedded Machine Learning computing power, this development board can be a great tool to get started with intelligent voice and vision AI solutions. We will use the onboard digital microphone to capture environment sounds and an optimized Machine Learning model will run on the ESP32-S3R8 Xtensa LX7 dual-core processor. The TinyML model will classify sound as either noise, dog bark or dog howling. These classification results will then be displayed on an OLED screen. The XIAO ESP32S3 board was a good fit for this project due to it's high performance processor, wireless communication capabilities, and low power consumption.

As the embedded hardware is advancing, software developments are also coming up and they are enabling TinyML. We will use the Edge Impulse platform for this project and indeed this is leading Edge AI platform! I chose Edge Impulse because it simplifies the development and deployment platform. The platform supports integrating generative AI tools for synthetic data acquisition, ability to simultaneously have various machine learning pipelines each with it's deployment performance shown, and the ability to optimize Machine Learning models-enabling them to run even on microcontrollers with less flash and RAM storage. The experience using the Edge Impulse platform for this project made the workflow easy and it also enabled the deployment since the model optimization enabled 27% less RAM and 42% less flash (ROM) usage. This documentation will cover everything from preparing the dataset, training the model and deploying it to the XIAO ESP32S3!

You can find the public Edge Impulse project here: Generative AI powering dog sound classification. To add this project into your Edge Impulse account, click "Clone this project" at the top of the page. Next, go to the section "Deploying the Impulses to XIAO ESP32S3" for steps on deploying the model to the XIAO ESP32S3 development board.

Use Case Explanation

Canine security refers to the use of trained security dogs and expert dog handlers to detect and protect against threats. The effectiveness in dogs lies in their unique abilities. Animals, especially dogs, have a keen sense of smell and excellent hearing. As a result, dogs are the ideal animal to assist security guards in their duties and also provide security to workplaces and homesteads. However, at the same time, according to the American Veterinary Medical Association, more than 4.5 million people are bitten by dogs each year in the US. And while anyone can suffer a dog bite, delivery people are especially vulnerable. Statistics released by the US Postal Service show that 5,800 of its employees were attacked by dogs in the U.S. in 2020.

According to Sam Basso, a professional dog trainer, clients frequently admit they have more to learn about their dogs during his sessions. While humans have been able to understand how dogs act, there is still more learning that the average person requires so that they can better understand dogs. There are professional dog handlers that can be used to train owners but this comes at a great cost and also not everyone is ready to take the classes. To address these issues, we can utilize AI to develop a device that can detect specific dog sounds such as barking, and alert workers so that they can follow up on the situation that the dog is experiencing. In the case of delivery persons, an alert can inform them of a nearby aggressive dog.

Audio classification is a fascinating field with numerous applications, from speech recognition to sound event detection. Training AI models has become easier by using pre-trained networks. The transfer learning approach uses a pretrained model which is already trained using a large amount of data. This approach can significantly reduce the amount of labeled data required for training, it also reduces the training time and resources, and improves the efficiency of the learning process, especially in cases where there is limited data available.

Training a model requires setting up various configurations, such as data processing formats, model type, and training parameters. As developers, we experiment with different configurations and track their performance in terms of processing time, accuracy, classification speed, Flash and RAM usage. To facilitate this process, Edge Impulse offers the Experiments feature. This enables us to create multiple Machine Learning pipelines (Impulses) and easily view the performance metrics for all pipelines, helping us quickly understand how each configuration performs and identify the best one.

Finally, for deployment, this project requires a low-cost, small and powerful device that can run optimized Machine Learning models. The wearable will also require ability to connect to an OLED display using general-purpose input/output (GPIO) pins. Power management is another most important consideration for a wearable. The ability to easily connect a small battery, achieve low power consumption, and have battery charging would be great. In this case, the deployment mode makes use of the XIAO ESP32S3 development board owing to it's small form factor, high performance and lithium battery charge management capability.

Components and Hardware Configuration

Software components:

Edge Impulse Studio account
ElevenLabs account
Arduino IDE

Hardware components:

A personal computer
Seeed Studio XIAO ESP32S3 (Sense) development board with the camera detached
SSD1306 OLED display
3.7V lithium battery. In my case, I used a 500mAh battery.
3D printed parts for the wearable. Available to download on Printables.com
Some jumper wires and male header pins
Soldering iron and soldering wire
Super glue. Always be careful when handling glues!

Data Collection Process

To collect the data to be used in this project, we will use the Synthetic data generation tool on the platform. At the time of writing this documentation in October 2024, Edge Impulse has integrated three Generative AI platforms for synthetic data generation: Dall-E to generate images, Whisper for creating human speech elements, and ElevenLabs to generate audio sound effects. In our project, we will use ElevenLabs since it is great for generating non-voice audio samples. There is an amazing tutorial video that demonstrates how to use the integrated ElevenLabs audio generation feature with Edge Impulse. If we were instead capturing sounds from the environment, Edge Impulse also supports collecting data from various sources such as uploading files, using APIs, smartphone/computers, and even connecting development boards directly to your project so that you can fetch data from sensors.

The first step was to create a free account on ElevenLabs. You can do this by signing up with an email address and a password. However, note that with the current ElevenLabs pricing the free account gives 10,000 credits which can be used to generate around 10 minutes of audio per month. Edge Impulse's synthetic audio generation feature is offered in the Professional and Enterprise packages, but users can access the Enterprise package with a 14-day free access that doesn't require a credit card.

Once we have an account on both ElevenLabs and EdgeImpulse, we can get started with data creation. First, create a project (with Professional or Enterprise account) on Edge Impulse Studio. On the dashboard, navigate to "Data acquisition" and then "Synthetic data". Here, we will need to fill the form with our ElevenLabs API key and also parameters for the data generation such as the prompt, label, number of samples to be generated, length of the each sample, frequency of the generated audio files, and also prompt influence parameter.

To get an API key from ElevenLabs, first login to your account. Next, on the "home" page that opens after logging in, click "My Account" followed by "API Keys". This will open a new panel that enables managing the account API Keys. We need to click "Create API Key" and then give a name to the key, although the naming does not matter. Next, we click the "Create" button and this will generate an API key (a string of characters) that we need to copy to our Edge Impulse project in the "ElevenLabs.io API Key" field.

In generative AI, prompts act as inputs to the AI. These inputs are used to prompt the generative AI model to generate the desired response which can be text, images, video, sound, code and more. The goal of the prompt is to give the AI model enough information so that it can generate a response that is relevant to the prompt. For example, if we want ChatGPT to generate an invitation message we can simply ask it to "Generate an invitation message". However, if we were to add more details such as the time, venue, what kind of event is it (wedding, birthday, conference, workshop etc.), targeted audience, speakers; these can improve the quality of the response we get from ChatGPT. ElevenLabs have created a documentation of AI prompting and it also describes other parameters that they have enabled so that users can get more relevant responses.

For this demonstration project, at first I worked with 3 classes of sounds: dog barking, dog howling and environment sounds (with city streets, construction sites and people talking). In this case, the prompt that I used to generate sound for each class was "dog howling", "dog barking" and "environmental sounds (e.g., city streets, construction sites, people talking)" respectively. The labels for each class was dog_howling, dog_barking and environment respectively. For each prompt, I used a prompt influence of 0.6 (this generated the best sounds), "Number of samples" as 6, "Minimum length (seconds)" as 1, "Frequency (Hz)" as 16000 and "Upload to category" as training. With these configurations, when we click the "Generate data" button on Edge Impulse Studio, this will generate 6 audio samples each of 1 second for one class. To generate sound for another class, we can simply put the prompt for it and leave the other fields unchanged. I used this configuration to generate around 39 minutes of audio files consisting of dogs barking, dogs howling and environment (e.g., city streets, construction sites, people talking) sounds.

However, later on after experimenting with various models, I noticed significant bias in the dog barking class, leading the models to classify any unheard sounds as dog barks (in other words, the models were overpredicting the dog bark class). In this case, I created another class, noise, consisting of 10 minute recordings from quiet environments with conversations, silence, and low machine sounds like a refrigerator and a fan. I uploaded the recordings to the Edge Impulse project and used the split data tool to extract 1 second audio samples from the recording. After several experiments, I observed that the model actually performed best when I only had 3 classes: dog barking, dog howling and noise sounds. Therefore, I disabled the environment class audio files in the dataset and this class was ignored in the pre-processing, model training and deployment.

In Machine Learning, it is always a challenge to train models effectively. Bias can be introduced by various factors and it can be very difficult to even identify that this problem exists. In our case, since the device will be continuously recording environment sound and classifying it, we need to also consider that it's not always that there will be a dog barking, dog howling, people talking and city sounds present. The environment can also be calm, with low noise or other sounds that the generative AI model failed to include in the environment class. Identifying this is key to fixing the bias of using the environment sounds (e.g., city streets, construction sites, people talking).

Finally, after using ElevenLabs integration and uploading my noise sound recording, I had around 36 minutes of sound data for both training and testing. In AI, the more data, the better the model will perform. For this demonstration project, I found the dataset size to be adequate.

Finally, once we have the dataset prepared, we need to split it for Training and Testing. The popular rule is 80/20 split and this indicates that 80% of the dataset is used for model training purposes while 20% is used for model testing. On Edge Impulse Studio projects, we can click red triangle with exclamation mark (as shown in the image below) and this will open an interface that suggests splitting our dataset.

We then click the button "Perform train / test split" on the interface that opens. This will open another interface that asks us if we are sure of rebalancing our dataset. We need to click the button "Yes perform train / test split" and finally enter "perform split" in the next window as prompted, followed by clicking the button "Perform train / test split".

Training the Machine Learning Model, with Experiments

After collecting data for our project, we can now train a Machine Learning model for the required sound classification task. To do this, on Edge Impulse we need to create an Impulse. An Impulse is a configuration that defines the input data type, data pre-processing algorithm and the Machine Learning model training. In our project, we are targeting to train an efficient sound classification model and "fit" inside a microcontroller (ESP32S3). In this case, there are a great number of parameters and algorithms that we need to choose accurately. One of the great features of the Edge Impulse platform is the powerful tools that simplify the development and deployment of Machine Learning. Edge Impulse recently released the Experiments feature which allows projects to contain multiple Impulses, where each Impulse can contains either the same combination of blocks or a different combination. This allows us to view the performance for various types of learning and processing blocks, while using the same input training and testing datasets.

First, for the model input I used a window size of 1 second, window increase size of 500ms (milliseconds), frequency of 16,000Hz and enabled the "Zero-pad data" field so that samples less than 1 second will be filled with zeros. Since we are targeting deployment to a resource constrained device, one way of reducing the size of data being processed is by reducing the length of audio samples being taken. Next, we need to define the audio pre-processing method. This operation is important since it extracts the meaningful features from the raw audio files. These features are then used as inputs for the Machine Learning model. The preprocessing includes steps such as converting the audio files into a spectrogram, normalizing the audio, removing noise, and feature extraction. There are various pre-processing algorithms that are used for sound data, such as Mel Frequency Cepstral Coefficients (MFCC), Mel-filterbank energy (MFE), Spectrogram, working with raw data, and more. Choosing the best pre-processing algorithm in sound classification is essential because the quality and relevance of input features directly impact the model's ability to learn and classify sounds accurately. The learning block will be the same for this sound classification project but you can experiment with other pre-processing algorithms to identify the best performing.

On our Edge Impulse project, we create the first Impulse. In this case, I first used MFE as the processing block and Classification as the learning block. Similarly to the Spectrogram, the Audio MFE processing extracts time and frequency features from a signal. However this algorithm uses a non-linear scale in the frequency domain, called Mel-scale. It performs well on audio data, mostly for non-voice recognition use cases when sounds to be classified can be distinguished by human ear. After saving this configuration, we click the "Save Impulse" button.

Next, we need to configure the processing block, MFE. Click "MFE" and we are presented with various parameters that we can set such as frame length, frame stride, filter number, FFT length, low frequency, high frequency and Noise floor (dB) for normalization. Selecting the appropriate parameters for configuring the digital signal processing (DSP) can be a troubling and time-consuming task, even for experienced digital signal processing engineers. To simplify this process, Edge Impulse supports automatic autotuning of the processing parameters. To ensure that I get the best pre-processing, I used this feature by clicking the "Autotune parameters" button. in this setup, we can reduce the inference time by changing the FFT length value to say 256.

By using the Autotune feature on Edge Impulse, the platform updated the processing block to use frame length of 0.025, frame stride of 0.01, filter number of 41, set the Lowest band edge of Mel filters (in Hz) to 80 and set the Noise floor (dB) as -91.

After configuring the processing parameters, we can generate features from the dataset. Still on the MFE page, we click the "Generate features" tab and finally the "Generate features" button. The features generation process will take some time depending on the size of the data. When this process is finished, the Feature explorer will plot the features. Note that features are the output of the processing block, and not the raw data itself. In our case, we can see that there is a good separation of the classes and this indicates that simpler Machine Learning (ML) models can be used with greater accuracy.

The last step is to train our model. We click "Classifier" which is our learning block that is using a Convolution Neural Network (CNN) model. After various experiments, I settled with 100 training cycles, a learning rate of 0.0005 and a 2D Convolution architecture. Finally, we can click the "Save & train" button to train our first model. After the training process was complete, the model had an accuracy of 98% and a loss of 0.04.

When training our model, we used 80% of the data in our dataset. The remaining 20% is used to test the accuracy of the model in classifying unseen data. We need to verify that our model has not overfit, by testing it on new data. To test our model, we first click "Model testing" then "Classify all". Our current model has an accuracy of 99%, which is pretty good!

At last, we have a simple Machine Learning model that can detect dog sounds! However, how do we know if this configuration is the most effective? We can experiment with three other processing blocks: Spectrogram, raw data processing and MFCC. To be specific, the difference between the Impulses in this project is the processing block. To add another Impulse, we click the current Impulse (Impulse #1) followed by "Create new impulse".

This will create a new Impulse instance. The steps to configure this Impulse are the same with the only difference being that we will select Spectrogram as the processing block. The Spectrogram processing block extracts time and frequency features from a signal. It performs well on audio data for non-voice recognition use cases, or on any sensor data with continuous frequencies. Once this Impulse has been saved we will again use the Autotuning feature, generate features and train a neural network with the same configurations as the first Impulse. After this process was completed the features generated with Spectrogram were not as well separated as compared to MFE used in the first Impulse, specifically features for the dog barking and howling sounds. The model training accuracy was 98% and the loss was 0.15. Finally, after testing the model on unseen data, the performance was also impressive with 98% accuracy.

Next, I experimented with using the raw audio files as inputs to the model. The Raw data block generates windows from data samples without any specific signal processing. It is great for signals that have already been pre-processed and if you just need to feed your data into the Neural Network block.The steps to configure this Impulse are the same as the first two with the only difference being that we will select Raw data as the processing block and we will use dense layers for the neural network architecture. After this process was completed, on the Feature explorer we can see that the audio data are not separated as compared to the first two processing blocks. The model training accuracy was 33% and the loss was 12.48. After testing the model on unseen data, the performance was also poor with an accuracy of 39%.

Finally, I experimented with using MFCC processing. The MFCC processing block extracts coefficients from an audio signal. Similarly to the Audio MFE block, it uses a non-linear scale called Mel-scale. It is the reference block for speech recognition but I also wanted to experiment it on non-human voice use cases. The steps to configure this Impulse are the same as the first three with the only difference being that we will select Audio (MFCC) as the processing block. After this process was completed, on the Feature explorer we can see that the audio data are separated but not well as compared to MFE and Spectrogram pre-processing. The training accuracy was 97% and the loss was 0.06. Testing the model on unseen data, the performance was again impressive with an accuracy of 96%.

Deploying the Impulses to XIAO ESP32S3

In this project, we now have four Impulses. The Experiments feature not only allows us to setup different Machine Learning processes, but it also allows us to deploy any Impulse. The MFE, Spectrogram and MFCC Impulses seem to perform well according to the model training and testing. I decided to skip deploying the Raw data Impulse since using raw data as the model input does not seem to yield good performance in this use case.

Edge Impulse have documented how to use the XIAO ESP32S3. We will deploy an Impulse as an Arduino library - a single package containing the signal processing blocks, configuration and learning blocks. You can include this package (Arduino library) in your own sketches to run the Impulse locally on microcontrollers.

To deploy the first Impulse to the XIAO ESP32S3 board, first we ensure that it is the current Impulse and then click "Deployment". In the field "Search deployment options" we need to select Arduino library. Since memory and CPU clock rate are limited for our deployment, we can optimize the model so that it can utilize the available resources on the ESP32S3 (or simply, so that it can fit and manage to run on the ESP32S3). Model optimization often has a trade-off whereby we decide whether to trade model accuracy for improved performance, or reduce the model's memory (RAM) usage. Edge Impulse has made model optimization very easy with just a click. Currently we can get two optimizations: EON compiler (gives the same accuracy but uses 27% less RAM and 42% less ROM) and TensorFlow Lite. The Edge Optimized Neural (EON) compiler is a powerful tool, included in Edge Impulse, that compiles machine learning models into highly efficient and hardware-optimized C++ source code. It supports a wide variety of neural networks trained in TensorFlow or PyTorch - and a large selection of classical ML models trained in scikit-learn, LightGBM or XGBoost. The EON Compiler also runs far more models than other inferencing engines, while saving up to 65% of RAM usage. TensorFlow Lite (TFLite) is an open-source machine learning framework that optimizes models for performance and efficiency, making them to be able to run on resource constrained devices. To enable model optimizations, I selected the EON Compiler and Quantized (int8).

Next, we need to add the downloaded .zip library to Arduino IDE and utilize the esp32_microphone example code. The deployment steps are also documented on the XIAO ESP32S3 deployment tutorial. Once we open the esp32_microphone sketch, we need to change the I2S library, update the microphone functions, and enable the ESP NN accelerator as described by MJRoBot (Marcelo Rovai) in Step 6. You can also obtain the complete updated code in this GitHub repository. Before uploading the code, we can follow the Seeed Studio documentation to download the ESP32 board on the Arduino IDE and then select the XIAO ESP32S3 board for uploading. With the XIAO ESP32S3 board still connected to the computer, we can open the Serial Monitor and see the inference results. We can see that the Digital Signal Processing (DSP) takes around 475ms (milliseconds) and the model takes around 90ms to classify the sound - which is very impressive. However, when I played YouTube videos of dog sound in front of the XIAO ESP32S3, like this one, the model did not correctly classify dog barks and we can see most of the confidence was on noise. Although this appears to be an issue, it may actually stem from the difference in sound quality between training and inference - the test using synthetic data performed well but deployment performance was not the same. In this case, the sounds captured during inference have noise, the volume of dog sounds is different, and overall the recordings are not clear as compared to the dataset samples.

We can then deploy the second Impulse which uses the Spectrogram pre-processing algorithm. The steps for the deployment are similar - we select Arduino library, enable EON compiler, select Quantized (int8) and download the Arduino library. To speed up compilation and use cached files in the Arduino IDE, we can simply unzip the second Impulse Arduino library and copy over the model-parameters and tflite-model folders to the first Impulse's Arduino library folder, overwriting the existing files with the updated model parameters. Unfortunately, the model is not able to run on the ESP32S3 board and we get an error failed to allocate tensor arena. This error means that we have run out of RAM on the ESP32S3.

Lastly, I experimented with deploying the MFCC Impulse. This algorithm works best for speech recognition but the model training and testing show that it performs well for detecting dog sounds. Following similar steps, I deployed the fourth Impulse using the EON Compiler and Quantized (int8) model optimizations. Surprisingly, this Impulse (using the MFCC processing algorithm) delivers the best performance even compared to the MFE pre-processing block. The Digital Signal Processing (DSP) takes approximately 285ms, with classification taking about 15ms. For detecting dog sounds, this Impulse accurately identifies with great confidence, demonstrating the positive impact of a DSP block on model performance!

Based on the experiments, I chose to continue with the fourth Impulse due to its accuracy and reduced latency.

Assembling the Wearable

A solid gadget needs a solid case! We are close, so its time to put our wearable together.

The wearable's components can be categorized into two parts: the electronic components and the 3D printed components. The 3D printed component files can be downloaded from printables.com. The wearable has a casing which is made up of two components: one holds the electrical components while the other is a cover. I 3D printed the housing and cover with PLA material.

The other 3D printed components are two flexible wrist straps. These are similar to the ones found on watches. I achieved the flexibility by printing them with TPU material. Note that if you do not have a good 3D printer you may need to widen the strap's holes after printing. I used super glue to attach the wrist straps to the case. Always be careful when handling glues!

A cool part is the wearable's dock/stand. This component is not important to the wearable's functionality, but a device's dock/stand is just always cool! It keeps your device in place, adds style to your space, and saves you from the fear of your device being tangled in cables.

The wearable's electronic components include:

Seeed Studio XIAO ESP32S3 (Sense) development board with the camera detached
SSD1306 OLED display
3.7V lithium battery. In my case, I used a 500mAh battery.
Some jumper wires and male header pins

The XIAO ESP32S3 board has a LiPo battery connector copper pads that we can use to solder wires for the battery connection. Note that the negative terminal of the power supply is the copper pad closest to the USB port, and the positive terminal of the power supply is the copper pad further away from the USB port.

The next task is to solder female jumper wires to the XIAO ESP32S3 I2C and power pins. These wires will then be connected to the SSD1306 OLED display. I chose to solder the wires directly to the board instead of using jumper wires on the board's pins since this will make the design more compact and reduce the height of the wearable. The pin list of the XIAO ESP32S3 board can be found in the Seeed documentation.

Once the electronic parts have been assembled, they can be put in the wearable's case according to the layout in the image below. Side vents on the case allow the onboard digital microphone to capture surrounding sounds effectively and they also help cool the ESP32S3.

Below is an image of my wearable after assembling the components.

After assembling the wearable, we can connect to the XIAO ESP32S3 board using the USB-C slot on the case to program it, and charge the LiPo battery! We can get the inference code from this GitHub repository tinyml_dog_bark_and_howl_classification. This Arduino sketch loads the model and runs continuous inference while printing the results via Serial. After a successful run of the inference code, I updated the code and added further processing of the inference results to display images on the OLED according to the predicted class with the highest confidence. This updated code can also be found in the GitHub repository: XIAO_ESP32S3_EI_dog_sound_classification_OLED_display.

Result

At last, our dog sound detection wearable is ready. We have successfully trained, tested, optimized, and deployed a Machine Learning model on the XIAO ESP32S3 Sense board. Once the wearable is powered on, the ESP32S3 board continuously samples sound of 1 second and predicts if it has heard dog sounds or noise. Note that since there is a latency of around 300 milliseconds (285ms for Digital Signal Processing and 15ms for classification) between the sampling and inference results. Some sounds may not be captured in time since other processes of the program will be executed. In this case, to achieve a smaller latency, we can target another hardware, such as the Syntiant TinyML board which features an always-on sensor and speech recognition processor, the NDP101.

To test the wearable, I used a television to play YouTube videos of construction sites and dog sounds. At first, I was expecting the model to not perform well since the YouTube playback sounds were not the same as the sounds that were generated by ElevenLabs. In Machine Learning, we target to train the model on a dataset that is the same representation of what it will be seeing during deployment. However, the model, and using the MFCC algorithm, performed well and it was able to accurately detect dog sounds, though sometimes barks were classified as howls and vice versa.

Let’s now put on our safety hats, or get packages to deliver, and put the TinyML wearable to test.

Conclusion

This low cost and low-power environmental sensing wearable is one of the many solutions that embedded AI has to offer. The presence of security dogs provides a sense of security and an unmatched source of environment state feedback to us humans. However, there is a great need to also understand how these intelligent animals operate so that we can understand and treat them better. The task at hand was very complicated, to capture sounds without causing disturbance to dogs, train a dog sound detection model, and optimize the model to run on a microcontroller. However, by utilizing the upcoming technologies of synthetic data generation and powerful tools offered by the Edge Impulse platform; we have managed to train and deploy a custom Machine Learning model that can help workers.

The new Experiments feature of Edge Impulse is a powerful tool and it comes in very handy in the Machine Learning development cycle. There are numerous configurations that we can utilize to make the model more accurate and reduce hardware utilization on edge devices. In my experiments, I tried other configuration combinations and chose to present the best and worst performing ones in this documentation. Are you tired of trying out various Impulse configurations and deployment experimentation? Well, Edge Impulse offers yet another powerful tool, the EON Tuner. This tool helps you find and select the best embedded machine learning model for your application within the constraints of your target device. The EON Tuner analyzes your input data, potential signal processing blocks, and neural network architectures - and gives you an overview of possible model architectures that will fit your chosen device's latency and memory requirements. First, make sure you have data in your Edge Impulse project. Next, select the "Experiments" tab and finally the "EON Tuner" tab. On the page, configure your target device and your application budget, and then click the "New run" button.

You can find the public Edge Impulse project here: Generative AI powering dog sound classification. This GitHub repository includes the deployed Edge Impulse library together with inference code and OLED usage functions. A future work on this project would be to include other alert features such as sending SMS messages or including a vibration motor such that the wearable can vibrate when dog sounds are detected. This vibration can then be felt by the user, in case of headphone or earplug usage in certain situations or environments.