Adding sight to your sensors - Sony Spresense

In this tutorial, you'll use machine learning to build a system that can recognize objects in your house through your Sony Spresense - a task known as image classification - connected to your device. Adding sight to your embedded devices can make them see the difference between poachers and elephants, do quality control on factory lines, or let your RC cars drive themselves. In this tutorial you'll learn how to collect images for a well-balanced dataset, how to apply transfer learning to train a neural network, and deploy the system to your Spresense.

At the end of this tutorial, you'll have a firm understanding of how to classify images using Edge Impulse.

You can view the finished project, including all data, signal processing and machine learning blocks here: Tutorial: adding sight to your sensors.

1. Prerequisites

For this tutorial you'll need a Sony's Spresense with camera add-on.

If you don't have a device yet, you can also upload an existing dataset through the Uploader. After this tutorial you can then deploy your trained machine learning model as a C++ library and run it on your device by following the Running your impulse locally tutorial.

You also may want to create a new Edge Impulse project for this tutorial, or use an existing one. You can create a new project on your project dashboard

2. Building a dataset

In this tutorial we'll build a model that can distinguish between two objects in your house - we've used a plant and a lamp, but feel free to pick two other objects. To make your machine learning model see it's important that you capture a lot of example images of these objects. When training the model these example images are used to let the model distinguish between them. Because there are (hopefully) a lot more objects in your house than just lamps or plants, you also need to capture images that are neither a lamp or a plant to make the model work well.

Capture the following amount of data - make sure you capture a wide variety of angles and zoom levels:

50 images of a lamp.
50 images of a plant.
50 images of neither a plant nor a lamp - make sure to capture a wide variation of random objects in the same room as your lamp or plant.

You can collect data from your Spresense using the Edge Impulse CLI. Make sure you followed the Getting Started guide for the Spresense, then run the edge impulse daemon.

edge-impulse-daemon --clean

Once connected follow the guide on Collecting Image Data from Studio to build your dataset. Alternatively, you can capture your images using another camera, and then upload them by going to Data acquisition and clicking the 'Upload' icon.

Afterwards you should have a well-balanced dataset listed under Data acquisition in your Edge Impulse project. You can switch between your training and testing data with the two buttons above the 'Data collected' widget.

3. Designing an impulse

With the training set in place you can design an impulse. An impulse takes the raw data, adjusts the image size, uses a preprocessing block to manipulate the image, and then uses a learning block to classify new data. Preprocessing blocks always return the same values for the same input (e.g. convert a color image into a grayscale one), while learning blocks learn from past experiences.

For this tutorial we'll use the 'Images' preprocessing block. This block takes in the color image, optionally makes the image grayscale, and then turns the data into a features array. If you want to do more interesting preprocessing steps - like finding faces in a photo before feeding the image into the network -, see the Building custom processing blocks tutorial. Then we'll use a 'Transfer Learning' learning block, which takes all the images in and learns to distinguish between the three ('plant', 'lamp', 'unknown') classes.

In the studio go to Create impulse, set the image width and image height to 96, and add the 'Images' and 'Transfer Learning (Images)' blocks. Then click Save impulse.

Configuring the processing block

To configure your processing block, click Images in the menu on the left. This will show you the raw data on top of the screen (you can select other files via the drop down menu), and the results of the processing step on the right. You can use the options to switch between 'RGB' and 'Grayscale' mode, but for now leave the color depth on 'RGB' and click Save parameters.

This will send you to the 'Feature generation' screen. In here you'll:

Resize all the data.
Apply the processing block on all this data.
Create a 3D visualization of your complete dataset.

Click Generate features to start the process.

Afterwards the 'Feature explorer' will load. This is a plot of all the data in your dataset. Because images have a lot of dimensions (here: 96x96x3=27,648 features) we run a process called 'dimensionality reduction' on the dataset before visualizing this. Here the 27,648 features are compressed down to just 3, and then clustered based on similarity. Even though we have little data you can already see some clusters forming (lamp images are all on the right), and can click on the dots to see which image belongs to which dot.

Configuring the transfer learning model

With all data processed it's time to start training a neural network. Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. The network that we're training here will take the image data as an input, and try to map this to one of the three classes.

It's very hard to build a good working computer vision model from scratch, as you need a wide variety of input data to make the model generalize well, and training such models can take days on a GPU. To make this easier and faster we are using transfer learning. This lets you piggyback on a well-trained model, only retraining the upper layers of a neural network, leading to much more reliable models that train in a fraction of the time and work with substantially smaller datasets.

To configure the transfer learning model, click Transfer learning in the menu on the left. Here you can select the base model (the one selected by default will work, but you can change this based on your size requirements), optionally enable data augmentation (images are randomly manipulated to make the model perform better in the real world), and the rate at which the network learns.

Set:

Transfer learning model to MobileNetV1 96x96 0.25
Number of training cycles to 20.
Learning rate to 0.0005.
Data augmentation: enabled.
Minimum confidence rating: 0.7.

And click Start training. After the model is done you'll see accuracy numbers, a confusion matrix and some predicted on-device performance on the bottom. You have now trained your model!

Important: The full example firmware for the Sony's Spresense supports a variety of sensor interfaces, data ingestion, and other capabilities. As a result - this firmware supports image transfer learning networks up to the MobileNetV1 96x96 0.25. Custom firmware may allow larger models to run on the device. To learn more, see Running your impulse locally to deploy trained Edge Impulse models with your firmware, and check out the EON Tuner documentation to learn how to optimize your model's performance and memory usage

4. Validating your model

With the model trained let's try it out on some test data. When collecting the data we split the data up between a training and a testing dataset. The model was trained only on the training data, and thus we can use the data in the testing dataset to validate how well the model will work in the real world. This will help us ensure the model has not learned to overfit the training data, which is a common occurrence.

To validate your model, go to Model testing, select the checkbox next to 'Sample name' and click Classify selected. Here we hit 89% accuracy, which is great for a model with so little data.

To see a classification in detail, click the three dots next to an item, and select Show classification. This brings you to the Live classification screen with much more details on the file (if you collected data with your mobile phone you can also capture new testing data directly from here). This screen can help you determine why items were misclassified.

5. Running the model on your device

With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption. Edge Impulse can package up the complete impulse - including the preprocessing steps, neural network weights, and classification code - in a single C++ library that you can include in your embedded software.

To run your impulse click on Deployment in the menu. Then under 'Build firmware' select the Spresense development board, and click Build. This will export the impulse, and build a binary that will run on your development board in a single step. After building is completed you'll get prompted to download a binary. Save this on your computer.

Flashing the device

When you click the Build button, you'll see a pop-up with text and video instructions on how to deploy the binary to your particular device. Follow these instructions. Once you are done, we are ready to test your impulse out.

If you run into issues flashing your device, follow the steps and troubleshooting information in the Getting Started guide.

Running the model on the device

We can connect to the board's newly flashed firmware over serial. Open a terminal and run:

$ edge-impulse-run-impulse

Congratulations! You've added sight to your sensors. Now that you've trained your model you can integrate your impulse in the firmware of your own embedded device, see Running your impulse locally. There are examples for Mbed OS, Arduino, STM32CubeIDE, Zephyr, and any other target that supports a C++ compiler.

Or if you're interested in more, see our tutorials on Continuous motion recognition or Adding sight to your sensors. If you have a great idea for a different project, that's fine too. Edge Impulse lets you capture data from any sensor, build custom processing blocks to extract features, and you have full flexibility in your Machine Learning pipeline with the learning blocks.

We can't wait to see what you'll build! 🚀

PreviousResponding to your voice - TI LaunchXL NextObject detection - SiLabs xG24 Dev Kit

Last updated 3 years ago

Was this helpful?