In this tutorial, you'll use machine learning to build a system that can recognize objects in your house through a camera - a task known as image classification - connected to a microcontroller. Adding sight to your embedded devices can make them see the difference between poachers and elephants, do quality control on factory lines, or let your RC cars drive themselves. In this tutorial you'll learn how to collect images for a well-balanced dataset, how to apply transfer learning to train a neural network, and deploy the system to an embedded device.
At the end of this tutorial, you'll have a firm understanding of how to classify images using Edge Impulse.
There is also a video version of this tutorial:
For this tutorial you'll need a supported device. Either:
- The OpenMV Cam H7 Plus.
- A mobile phone - follow these instructions to connect your device to Edge Impulse before continuing.
If you don't have any of these devices, you can also upload an existing dataset through the Uploader. After this tutorial you can then deploy your trained machine learning model as a C++ library and run it on your device.
In this tutorial we'll build a model that can distinguish between two objects in your house - we've used a plant and a lamp, but feel free to pick two other objects. To make your machine learning model see it's important that you capture a lot of example images of these objects. When training the model these example images are used to let the model distinguish between them. Because there are (hopefully) a lot more objects in your house than just lamps or plants, you also need to capture images that are neither a lamp or a plant to make the model work well.
Capture the following amount of data - make sure you capture a wide variety of angles and zoom levels:
- 30 images of a lamp.
- 30 images of a plant.
- 50 images of neither a plant nor a lamp - make sure to capture a wide variation of random objects in the same room as your lamp or plant.
You can collect data from the following devices:
Or you can capture your images using another camera, and then upload them by going to Data acquisition and clicking the 'Upload' icon.
Afterwards you should have a well-balanced dataset listed under Data acquisition in your Edge Impulse project. You can switch between your training and testing data with the two buttons above the 'Data collected' widget.
With the training set in place you can design an impulse. An impulse takes the raw data, adjusts the image size, uses a preprocessing block to manipulate the image, and then uses a learning block to classify new data. Preprocessing blocks always return the same values for the same input (e.g. convert a color image into a grayscale one), while learning blocks learn from past experiences.
For this tutorial we'll use the 'Images' preprocessing block. This block takes in the color image, optionally makes the image grayscale, and then turns the data into a features array. If you want to do more interesting preprocessing steps - like finding faces in a photo before feeding the image into the network -, see the Building custom processing blocks tutorial. Then we'll use a 'Transfer Learning' learning block, which takes all the images in and learns to distinguish between the three ('plant', 'lamp', 'unknown') classes.
In the studio go to Create impulse, set the image width and image height to
96, and add the 'Images' and 'Transfer Learning (Images)' blocks. Then click Save impulse.
To configure your processing block, click Images in the menu on the left. This will show you the raw data on top of the screen (you can select other files via the drop down menu), and the results of the processing step on the right. You can use the options to switch between 'RGB' and 'Grayscale' mode, but for now leave the color depth on 'RGB' and click Save parameters.
This will send you to the 'Feature generation' screen. In here you'll:
- Resize all the data.
- Apply the processing block on all this data.
- Create a 3D visualization of your complete dataset.
Click Generate features to start the process.
Afterwards the 'Feature explorer' will load. This is a plot of all the data in your dataset. Because images have a lot of dimensions (here: 96x96x3=27,648 features) we run a process called 'dimensionality reduction' on the dataset before visualizing this. Here the 27,648 features are compressed down to just 3, and then clustered based on similarity. Even though we have little data you can already see some clusters forming (lamp images are all on the right), and can click on the dots to see which image belongs to which dot.
With all data processed it's time to start training a neural network. Neural networks are a set of algorithms, modeled loosely after the human brain, that are designed to recognize patterns. The network that we're training here will take the image data as an input, and try to map this to one of the three classes.
It's very hard to build a good working computer vision model from scratch, as you need a wide variety of input data to make the model generalize well, and training such models can take days on a GPU. To make this easier and faster we are using transfer learning. This lets you piggyback on a well-trained model, only retraining the upper layers of a neural network, leading to much more reliable models that train in a fraction of the time and work with substantially smaller datasets.
To configure the transfer learning model, click Transfer learning in the menu on the left. Here you can select the base model (the one selected by default will work, but you can change this based on your size requirements), optionally enable data augmentation (images are randomly manipulated to make the model perform better in the real world), and the rate at which the network learns.
- Number of training cycles to
- Learning rate to
- Data augmentation: enabled.
- Minimum confidence rating: 0.7.
And click Start training. After the model is done you'll see accuracy numbers, a confusion matrix and some predicted on-device performance on the bottom. You have now trained your model!
With the model trained let's try it out on some test data. When collecting the data we split the data up between a training and a testing dataset. The model was trained only on the training data, and thus we can use the data in the testing dataset to validate how well the model will work in the real world. This will help us ensure the model has not learned to overfit the training data, which is a common occurrence.
To validate your model, go to Model testing, select the checkbox next to 'Sample name' and click Classify selected. Here we hit 89% accuracy, which is great for a model with so little data.
To see a classification in detail, click the three dots next to an item, and select Show classification. This brings you to the Live classification screen with much more details on the file (if you collected data with your mobile phone you can also capture new testing data directly from here). This screen can help you determine why items were misclassified.
With the impulse designed, trained and verified you can deploy this model back to your device. This makes the model run without an internet connection, minimizes latency, and runs with minimum power consumption. Edge Impulse can package up the complete impulse - including the preprocessing steps, neural network weights, and classification code - in a single C++ library that you can include in your embedded software.
To run your impulse on either the OpenMV camera or your phone, follow these steps:
- OpenMV Cam H7 Plus: Running your impulse on your OpenMV camera
- Mobile phone: just click Switch to classification mode at the bottom of your phone screen.
Congratulations! You've added sight to your sensors. Now that you've trained your model you can integrate your impulse in the firmware of your own embedded device, see Running your impulse locally. There are examples for Mbed OS, Arduino, STM32CubeIDE, Eta Compute, and any other target that supports a C++ compiler. Note that the model we trained in this tutorial is relatively big, but you can choose a smaller transfer learning model.
Or if you're interested in more, see our tutorials on Continuous motion recognition or Recognize sounds from audio. If you have a great idea for a different project, that's fine too. Edge Impulse lets you capture data from any sensor, build custom processing blocks to extract features, and you have full flexibility in your Machine Learning pipeline with the learning blocks.
We can't wait to see what you'll build! 🚀
Updated about a month ago