1 of 10

Learning blocks

After extracting meaningful features from the raw signal using signal processing, you can now train your model using a learning block. We provide a number of pre-defined learning blocks:

Classification (Keras).
Regression (Keras).
Anomaly Detection (K-means).
Image Classification (using Transfer Learning).
Keyword Spotting (using Transfer Learning).
Object Detection (using MobileNetV2 SSD FPN).
Object Detection (using FOMO).

Miss an architecture? You can create a custom learning block, with PyTorch, Keras or scikit-learn.

For most of the learning blocks (except K-means Anomaly Detection), you can use the Switch to expert mode button to access the full Keras API for custom architectures, rebalancing your weights, and more.

Classification (Keras)

If you have selected the Classification learning block in the Create impulse page, a NN Classifier page will show up in the menu on the left. This page becomes available after you've extracted your features from your DSP block.

Tutorials

Want to see the Classification block in action? Check out our tutorials:

The basic idea is that a neural network classifier will take some input data, and output a probability score that indicates how likely it is that the input data belongs to a particular class.

So how does a neural network know what to predict? The neural network consists of a number of layers, each of which is made up of a number of neurons. The neurons in the first layer are connected to the neurons in the second layer, and so on. The weight of a connection between two neurons in a layer is randomly determined at the beginning of the training process. The neural network is then given a set of training data, which is a set of examples that it is supposed to predict. The network's output is compared to the correct answer and, based on the results, the weights of the connections between the neurons in the layer are adjusted. This process is repeated a number of times, until the network has learned to predict the correct answer for the training data.

A particular arrangement of layers is referred to as an architecture, and different architectures are useful for different tasks. This way, after a lot of iterations, the neural network learns; and will eventually become much better at predicting new data.

On this page, you can configure the model and the training process and, have an overview of your model performances.

Neural Network settings

Number of training cycles: Each time the training algorithm makes one complete pass through all of the training data with back-propagation and updates the model's parameters as it goes, it is known as an epoch or training cycle.
Learning rate: The learning rate controls how much the models internal parameters are updated during each step of the training process. Or you can also see it as how fast the neural network will learn. If the network overfits quickly, you can reduce the learning rate
Validation set size: The percentage of your training set held apart for validation, a good default is 20%
Auto-balance dataset Mix in more copies of data from classes that are uncommon. Might help make the model more robust against overfitting if you have little data for some classes.

Neural Network architecture

Depending on your project type, we may offer to choose between different architecture presets to help you get started.

The neural network architecture takes as inputs your extracted features, and pass the features to each layer of your architecture. In the classification case, the last used layer is a softmax layer. It is this last layer that gives the probability of belonging to one of the classes.

From the visual (simple) mode, you can add the following layers:

Expert mode

If have advanced knowledge in machine learning and Keras, you can switch to the Expert Mode and access the full Keras API to use custom architectures:

Training output

This panel displays the output logs during the training. The previous training logs can also be retrieved from the Jobs tab in the Dashboard page (enterprise feature).

Model performances

This section gives an overview of your model performances and helps you evaluate your model. It can help you determine if the model is capable of meeting your needs or if you need to test other hyper parameters and architectures.

From the Last training performances you can retrieve your validation accuracy and loss.

The Confusion matrix is one of most useful tool to evaluate a model. it tabulates all of the correct and incorrect responses a model produces given a set of data. The labels on the side correspond to the actual labels in each sample, and the labels on the top correspond to the predicted labels from the model.

The features explorer, like in the processing block views, indicated the spatial distribution of your input features. In this page, you can visualize which ones have been correctly classified and which ones have not.

On-device performance: Based on the target you chose in the Dashboard page, we will output estimations for the inferencing time, peak RAM usage and flash usage. This will help you validate that your model will be able to run on your device based on its constraints.

Anomaly detection (K-means)

Neural networks are great, but they have one big flaw. They're terrible at dealing with data they have never seen before (like a new gesture). Neural networks cannot judge this, as they are only aware of the training data. If you give it something unlike anything it has seen before it'll still classify as one of the four classes.

Tutorial

Want to see the Anomaly Detection in action? Check out our Continuous Motion Recognition tutorial.

K-means clustering

This method looks at the data points in a dataset and groups those that are similar into a predefined number K of clusters. A threshold value can be added to detect anomalies: if the distance between a data point and its nearest centroid is greater than the threshold value, then it is an anomaly.

The main difficulty resides in choosing K, since data in a time series is always changing and different values of K might be ideal at different times. Besides, in more complex scenarios where there are both local and global outliers, many outliers might pass under the radar and be assigned to a cluster.

Features importance (optional)

In most of your DSP blocks, you have an option to calculate the feature importance. Edge Impulse Studio will then output a Feature Importance graphic that will help you determine which axes and values generated from your DSP block are most significant to analyze when you want to do anomaly detection.

This process of generating features and determining the most important features of your data will further reduce the amount of signal analysis needed on the device with new and unseen data.

Setting up the anomaly detection block

In your anomaly detection block, you can click on the Select suggested axes button to harness the value of the feature importance output.

Here is the process in the background:

Create X number of clusters and group all the data.
For each of these clusters we store the center and the size of the cluster.
During inference we calculate the closest cluster for a new data point, and show the distance from the edge of the cluster. If it’s within a cluster (no anomaly)you thus get a value below 0.

In the above picture, known clusters in are in blue, new classified data in orange. It's clearly outside of any known clusters and can thus be tagged as an anomaly.

Additional resources

Tutorial: Continuous Motion Recognition
Blog post: Advanced Anomaly Detection with Feature Importance

Regression (Keras)

Solving regression problems is one of the most common applications for machine learning models, especially in supervised machine learning. Models are trained to understand the relationship between independent variables and an outcome or dependent variable. The model can then be leveraged to predict the outcome of new and unseen input data, or to fill a gap in missing data.

Prerequisites

Labelling

To build a regression model you collect data as usual, but rather than setting the label to a text value, you set it to a numeric value.

Processing blocks

You can use any of the built-in signal processing blocks to pre-process your vibration, audio or image data, or use custom processing blocks to extract novel features from other types of sensor data.

Train your regression block

You have full freedom in modifying your neural network architecture - whether visually or through writing Keras code.

Number of training cycles: Each time the training algorithm makes one complete pass through all of the training data with back-propagation and updates the model's parameters as it goes, it is known as an epoch or training cycle.
Learning rate: The learning rate controls how much the models internal parameters are updated during each step of the training process. Or you can also see it as how fast the neural network will learn. If the network overfits quickly, you can reduce the learning rate
Validation set size: The percentage of your training set held apart for validation, a good default is 20%
Auto-balance dataset Mix in more copies of data from classes that are uncommon. Might help make the model more robust against overfitting if you have little data for some classes.

Test your regression model

If you want to see the accuracy of your model across your test dataset, go to Model testing. You can adjust the Maximum error percentage by clicking on "⋮" button.

Additional resources

Transfer learning (Images)

When creating an impulse to solve an image classification problem, you will most likely want to use transfer learning. This is particularly true when working with a relatively small dataset.

To choose transfer learning as your learning block, go to create impulse and click on Add a Learning Block, and select Transfer Learning.

To choose your preferred pre-trained network, go to Transfer learning on the left side of your screen and click choose a different model. A pop up will appear on your screen with a list of models to choose from as shown in the image below.

Edge Impulse uses state of the art MobileNetV1 & V2 architectures trained on an ImageNet dataset as it's pre-trained network for you to fine-tune for your specific application. The pre-trained networks comes with varying input blocks ranging from 96x96 to 320x320 and both RGB & Grayscale images for you to choose from depending on your application & target deployment hardware.

Neural Network Settings

Before you start training your model, you need to set the following neural network configurations:

Number of training cycles: Each time the training algorithm makes one complete pass through all of the training data with back-propagation and updates the model's parameters as it goes, it is known as an epoch or training cycle.
Learning rate: The learning rate controls how much the models internal parameters are updated during each step of the training process. Or you can also see it as how fast the neural network will learn. If the network overfits quickly, you can reduce the learning rate
Validation set size: The percentage of your training set held apart for validation, a good default is 20%.

You might also need to enable auto balance to prevent model bias or even enable data augmentation to increase the size of your dataset and have more diverse dataset to prevent overfitting.

The preset configurations just don't work for your model? No worries, Expert Mode is for you! Expert Mode gives you full control of your model so that you can configure it however you want. To enable the expert mode, just click on the "⋮" button and toggle the expert mode.

You can use the expert mode to change your loss function, optimizer, print your model architecture and even set an early stopping callback to prevent overfitting your model.

Transfer learning (Keyword Spotting)

Transfer learning is the process of taking features learned from one problem and leveraging it on a new but related problem. Most of the time these features are learned from large scale datasets with common objects hence making it faster & more accurate to tune and adapt to new tasks. With Edge Impulse's transfer learning block for audio keyword spotting, we take the same transfer learning technique classically used for image classification and apply it to audio data. This allows you to fine-tune a pre-trained keyword spotting model on your data and achieve even better performance than using a classification block, even with a relatively small keyword dataset.

Excited? Train your first keyword spotting model in under 5 minutes with the getting started wizard!

To choose transfer learning as your learning block, go to create impulse and click on Add a Learning Block, and select Transfer Learning (Keyword Spotting).

To choose your preferred pre-trained network, select the Transfer learning tab on the left side of your screen and click choose a different model. A pop up will appear on your screen with a list of models to choose from as shown in the image below.

Edge Impulse uses state of the art MobileNetV1 & V2 architectures trained on an ImageNet dataset as it's pre-trained network for you to fine-tune for your specific application.

Neural Network Settings

Before you start training your model, you need to set the following neural network configurations:

Number of training cycles: Each time the training algorithm makes one complete pass through all of the training data with back-propagation and updates the model's parameters as it goes, it is known as an epoch or training cycle.
Learning rate: The learning rate controls how much the models internal parameters are updated during each step of the training process. Or you can also see it as how fast the neural network will learn. If the network overfits quickly, you can reduce the learning rate
Validation set size: The percentage of your training set held apart for validation, a good default is 20%.

You might also need to enable auto balance to prevent model bias or even enable data augmentation to increase the size of your dataset and have more diverse dataset to prevent overfitting.

You can use the expert mode to change your loss function, optimizer, print your model architecture and even set an early stopping callback to prevent overfitting your model.

Object detection (Images)

The two most common image processing problems are and object detection.

Image classification takes an image as an input and outputs what type of object is in the image. This technique works great, even on microcontrollers, as long as we only need to detect a single object in the image.

On the other hand, object detection takes an image and outputs information about the class and number of objects, position, (and, eventually, size) in the image.

Edge Impulse provides two different methods to perform object detection:

Using
Using

Specifications

MobileNetV2 SSD FPN

FOMO

MobileNetV2 SSD FPN

It's very hard to build a computer vision model from scratch, as you need a wide variety of input data to make the model generalize well, and training such models can take days on a GPU. To make building your model easier and faster we are using transfer learning. This lets you piggyback on a well-trained model, only re-training the upper layers of a neural network, leading to much more reliable models that train in a fraction of the time and work with substantially smaller datasets.

Tutorial

Want to see MobileNetV2 SSD FPN-Lite models in action? Check out our tutorial.

How to get started?

To build your first object detection models using MobileNetV2 SSD FPN-Lite :

Create a new project in Edge Impulse.
Make sure to set your labelling method to 'Bounding boxes (object detection)'.
Collect and prepare your dataset as in
Resize your image to fit 320x320px
Add an 'Object Detection (Images)' block to your impulse.
Under Images, choose RGB.
Under Object detection, select 'Choose a different model' and select 'MobileNetV2 SSD FPN-Lite 320x320'
You can start your training with a learning rate of '0.15'

Click on 'Start training'

How does this 🪄 work?

Here, we are using the MobileNetV2 SSD FPN-Lite 320x320 pre-trained model. The model has been trained on the COCO 2017 dataset with images scaled to 320x320 resolution.

In the MobileNetV2 SSD FPN-Lite, we have a base network (MobileNetV2), a detection network (Single Shot Detector or SSD) and a feature extractor (FPN-Lite).

Base network:

MobileNet, like VGG-Net, LeNet, AlexNet, and all others, are based on neural networks. The base network provides high-level features for classification or detection. If you use a fully connected layer and a softmax layer at the end of these networks, you have a classification.

But you can remove the fully connected and the softmax layers, and replace it with detection networks, like SSD, Faster R-CNN, and others to perform object detection.

Detection network:

The most common detection networks are SSD (Single Shot Detection) and RPN (Regional Proposal Network).

When using SSD, we only need to take one single shot to detect multiple objects within the image. On the other hand, regional proposal networks (RPN) based approaches, such as R-CNN series, need two shots, one for generating region proposals, one for detecting the object of each proposal.

As a consequence, SSD is much faster compared with RPN-based approaches but often trades accuracy with real-time processing speed. They also tend to have issues in detecting objects that are too close or too small.

Feature Pyramid Network:

Detecting objects in different scales is challenging in particular for small objects. Feature Pyramid Network (FPN) is a feature extractor designed with feature pyramid concept to improve accuracy and speed.

FOMO: Object detection for constrained devices

Edge Impulse FOMO (Faster Objects, More Objects) is a novel machine learning algorithm that brings object detection to highly constrained devices. It lets you count objects, find the location of objects in an image, and track multiple objects in real-time using up to 30x less processing power and memory than MobileNet SSD or YOLOv5.

Tutorials

Want to see FOMO in action? Check out our tutorial.

For example, FOMO lets you do 60 fps object detection on a Raspberry Pi 4:

And here's FOMO doing 30 fps object detection on an Arduino Nicla Vision (Cortex-M7 MCU), using 245K RAM.

How does this 🪄 work?

So how does that work? First, a small primer. Let's say you want to detect whether you see a face in front of your sensor. You can approach this in two ways. You can train a simple binary classifier, which says either "face" or "no face", or you can train a complex object detection model which tells you "I see a face at this x, y point and of this size". Object detection is thus great when you need to know the exact location of something, or if you want to count multiple things (the simple classifier cannot do that) - but it's computationally much more intensive, and you typically need much more data for it.

The design goal for FOMO was to get the best of both worlds: the computational power required for simple image classification, but with the additional information on location and object count that object detection gives us.

Heat maps

The first thing to realize is that while the output of the image classifier is "face" / "no face" (and thus no locality is preserved in the outcome) the underlying neural network architecture consists of a number of convolutional layers. A way to think about these layers is that every layer creates a diffused lower-resolution image of the previous layer. E.g. if you have a 16x16 image the width/height of the layers may be:

16x16
4x4
1x1

Each 'pixel' in the second layer maps roughly to a 4x4 block of pixels in the input layer, and the interesting part is that locality is somewhat preserved. The 'pixel' in layer 2 at (0, 0) will roughly map back to the top left corner of the input image. The deeper you go in a normal image classification network, the less of this locality (or "receptive field") is preserved until you finally have just 1 outcome.

FOMO uses the same architecture, but cuts off the last layers of a standard image classification model and replaces this layer with a per-region class probability map (e.g. a 4x4 map in the example above). It then has a custom loss function which forces the network to fully preserve the locality in the final layer. This essentially gives you a heatmap of where the objects are.

The resolution of the heat map is determined by where you cut off the layers of the network. For the FOMO model trained above (on the beer bottles) we do this when the size of the heat map is 8x smaller than the input image (input image of 160x160 will yield a 20x20 heat map), but this is configurable. When you set this to 1:1 this actually gives you pixel-level segmentation and the ability to count a lot of small objects.

Training on centroids

A difference between FOMO and other object detection algorithms is that it does not output bounding boxes, but it's easy to go from heat map to bounding boxes. Just draw a box around a highlighted area.

However, when working with early customers we realized that bounding boxes are merely an implementation detail of other object detection networks, and are not a typical requirement. Very often the size of objects is not important as cameras are in fixed locations (and objects thus fixed size), but rather you just want the location and the count of objects.

Thus, we now train on the centroids of objects. This makes it much easier to count objects that are close (every activation in the heat map is an object), and the convolutional nature of the neural network ensures we look around the centroid for the object anyway.

A downside of the heat map is that each cell acts as its own classifier. E.g. if your classes are "lamp", "plant" and "background" each cell will be either lamp, plant, or background. It's thus not possible to detect objects with overlapping centroids. You can see this in the Raspberry Pi 4 video above at 00:18 where the beer bottles are too close together. This can be solved by using a higher resolution heat map.

Flexible and very, very fast

A really cool benefit of FOMO is that it's fully convolutional. If you set an image:heat map factor of 8 you can throw in a 96x96 image (outputs 12x12 heat map), a 320x320 image (outputs 40x40 heat map), or even a 1024x1024 image (outputs 128x128 heat map). This makes FOMO incredibly flexible, and useful even if you have very large images that need to be analyzed (e.g. in fault detection where the faults might be very, very small). You can even train on smaller patches, and then scale up during inference.

Additionally FOMO is compatible with any MobileNetV2 model. Depending on where the model needs to run you can pick a model with a higher or lower alpha, and transfer learning also works (although you need to train your base models specifically with FOMO in mind). This makes it easy for end customers to use their existing models and fine-tune them with FOMO to also add locality (e.g. we have customers with large transfer learning models for wildlife detection).

Together this gives FOMO the capabilities to scale from the smallest microcontrollers all the way to full gateways or GPUs. Just some numbers:

The video on the top classifies 60 times / second on a stock Raspberry Pi 4 (160x160 grayscale input, MobileNetV2 0.1 alpha). This is 20x faster than MobileNet SSD which does ~3 frames/second.
The second video on the top classifies 30 times / second on an Arduino Nicla Vision board with a Cortex-M7 MCU running at 480MHz) in ~240K of RAM (96x96 grayscale input, MobileNetV2 0.35 alpha).
The smallest version of FOMO (96x96 grayscale input, MobileNetV2 0.05 alpha) runs in <100KB RAM and ~10 fps. on a Cortex-M4F at 80MHz. [1]

How to get started?

To build your first FOMO models:

Create a new project in Edge Impulse.
Make sure to set your labeling method to 'Bounding boxes (object detection)'.
Add an 'Object Detection (Images)' block to your impulse.
Under Images, select 'Grayscale'
Under Object detection, select 'Choose a different model' and select one of the FOMO models.
Make sure to lower the learning rate to 0.001 to start.

Expert mode tips

Additional configuration for FOMO can be accessed via expert mode.

Object weighting

FOMO is sensitive to the ratio of objects to background cells in the labelled data. By default the configuration is to weight object output cells x100 in the loss function, object_weight=100, as a way of balancing what is usually a majority of background. This value was chosen as a sweet spot for a number of example use cases. In scenarios where the objects to detect are relatively rare this value can be increased, e.g. to 1000, to have the model focus even more on object detection (at the expense of potentially more false detections).

MobileNet cut point

Choosing a different cut_point results in a different spatial reduction; e.g. if we cut higher at block_3_expand_relu FOMO will instead only do a spatial reduction of 1/4 (i.e. a 96x96 input results in a 24x24output)

Note though; this means taking much less of the MobileNet backbone and results in a model with only 1/2 the params. Switching to a higher alpha may counteract this parameter reduction. Later FOMO releases will counter this parameter reduction with a UNet style architecture.

FOMO classifier capacity

FOMO can be thought of logically as the first section of MobileNetV2 followed by a standard classifier where the classifier is applied in a fully convolutional fashion.

In the default configuration this FOMO classifier is equivalent to a single dense layer with 32 nodes followed by a classifier with num_classes outputs.

For a three way classifier, using the default cut point, the result is a classifier head with ~3200 parameters.

 LAYER                          SHAPE                NUMBER OF PARAMETERS
 block_6_expand_relu (ReLU)     (None, 20, 20, 96)   0                                         
 head (Conv2D)                  (None, 20, 20, 32)   3104                                            
 logits (Conv2D)                (None, 20, 20, 3)    99

We have the option of increasing the capacity of this classifier head by either 1) increasing the number of filters in the Conv2D layer, 2) adding additional layers or 3) doing both.

For example we might change the number of filters from 32 to 16, as well as adding another convolutional layer, as follows.

 LAYER                          SHAPE                NUMBER OF PARAMETERS
 block_6_expand_relu (ReLU)     (None, 20, 20, 96)   0
 head_1 (Conv2D)                (None, 20, 20, 16)   1552                                         
 head_2 (Conv2D)                (None, 20, 20, 16)   272                                          
 logits (Conv2D)                (None, 20, 20, 3)    51

For some problems an additional layer can improve performance, and in this case actually uses less parameters. It can though potentially take longer to train and require more data. In future releases the tuning of this aspect of FOMO can be handled by the EON Tuner.

Performance and Minimum Requirements

Just like the rest of our Neural Network-based learning blocks, FOMO is delivered as a set of basic math routines free of runtime dependencies. This means that there are virtually no limitations to running FOMO, other than:

Making sure the model itself can fit into the target's memory (flash/RAM), and
making sure the target also has enough memory to hold the image buffer (flash/RAM)in addition to your application logic

In all, we have seen buffer, model and app logic (including wireless stack) fit in as little as 200KB for 64x64 pixel images. But we would definitely recommend a target with at least 512KB so that you can take advantage of larger image sizes and a wider range of model optimizations.

With regards to latency, the speed of the target will determine the maximum number of frames that can be processed in a given interval (fps). This will of course be influenced by any other tasks the CPU may need to complete, but we have consistently seen MCUs running @ 80MHz complete a full pass on a 64x64 pixel image in under one second, which should translate to just under 1fps once you add the rest of your app logic. Keep in mind that frame throughput can increase dramatically at higher speeds or when tensor acceleration is available. We have measured 40-60 fps consistently on a Raspberry Pi 4 and ~15 fps on unaccelerated 480MHz targets. The table below summarizes this trade-off:

Custom learning blocks

Want to use a novel ML architecture, or load your own transfer learning models into Edge Impulse? Create a custom learning block! It's easy to bring in any training pipeline into the Studio, as long as you can output TFLite or ONNX files. We have end-to-end examples of doing this in Keras, PyTorch and scikit-learn.

If you just want to modify the neural network architecture or loss function, you can also use expert mode directly in the Studio, without having to bring your own model. Go to any ML block, select three dots, and select Switch to Keras (expert) mode.

This page describes the input and output formats if you want to bring your own model, but a good way to start building a custom learning block is by modifying one of the following example repositories:

YOLOv5 - wraps the Ultralytics YOLOv5 repository (trained with PyTorch) to train a custom transfer learning model.
EfficientNet - a Keras implementation of transfer learning with EfficientNet B0.
Keras - a basic multi-layer perceptron in Keras and TensorFlow.
PyTorch - a basic multi-layer perceptron in PyTorch.
Scikit-learn - trains a logistic regression model using scikit-learn, then outputs a TFLite file for inferencing using jax.

Editing built-in blocks

Any built-in block in the Edge Impulse Studio (e.g. classifiers, regression models or FOMO blocks) can be edited locally, and then pushed back as a custom block. This is great if you want to make heavy modifications to these training pipelines, for example to do custom data augmentation. To download a block, go to any ML block in your project, click the three dots, select Edit block locally, and follow the instructions in the README.

Dockerfiles

Training pipelines in Edge Impulse are built on top of Docker containers, a virtualization technique which lets developers package up an application with all dependencies in a single package. To train your own model you'll need to wrap all the required packages, your scripts, and (if you use transfer learning) your pre-trained weights into this container. When running in Edge Impulse the container does not have network access, so make sure you don't download dependencies while running (fine when building the container).

A typical Dockerfile might look like (see the example repositories for more information):

# syntax = docker/dockerfile:experimental
FROM ubuntu:20.04
WORKDIR /app

ARG DEBIAN_FRONTEND=noninteractive

# Install base packages (like Python and pip)
RUN apt update && apt install -y curl zip git lsb-release software-properties-common apt-transport-https vim wget python3 python3-pip
RUN python3 -m pip install --upgrade pip==20.3.4

# Copy Python requirements in and install them
COPY requirements.txt ./
RUN pip3 install -r requirements.txt

# Copy the rest of your training scripts in
COPY . ./

# And tell us where to run the pipeline
ENTRYPOINT ["python3", "-u", "train.py"]

Important: ENTRYPOINT

It's important to create an ENTRYPOINT at the end of the Dockerfile to specify which file to run.

GPU Support

If you want to have GPU support (only for enterprise customers), you'll need cuda packages installed. If you export a learn block from the Studio these will already have the right base packages, so use that Dockerfile as a starting point.

Arguments to your script

The entrypoint (see above in the Dockerfile) will be called with these four parameters:

--data-directory - where you can find the data (see below for the input/output formats).
--epochs - number of epochs to train for (set by the user in the UI).
--learning-rate - learning rate to train with (set by the user in the UI).
--out-directory - where to write the TFLite or ONNX files (see below for the input/output formats).

We realise that not every ML model requires setting epochs and learning rate, and we also realise that you might want to add extra options to the UI. Longer term we'll implement a parameter system similar to what custom processing blocks use.

Input format

The data directory contains your dataset, after running any DSP blocks, and already split in a train/validation set:

X_split_train.npy
Y_split_train.npy
X_split_test.npy
Y_split_train.npy

The X_*.npy files are float32 Numpy arrays, already in the right shape (e.g. if you're training on 96x96 RGB images this will be of shape (n, 96, 96, 3)). You can typically load these without any modification into your training pipeline (see the notes after this section for caveats).

The Y_*.npy files are either:

int32 Numpy arrays, with four columns (label_index, sample_id, sample_slice_start_ms, sample_slice_end_ms).
A JSON array in the form of: [{ "sampleId": 234731, "boundingBoxes": [{ "label": 1, "x": 260, "y": 313, "w": 234, "h": 261 }] } ]

2) is sent if your dataset has bounding boxes, in all other cases 1) is sent.

To get new data for your project, just run (requires Edge Impulse CLI v1.16 or higher):

edge-impulse-blocks runner --download-data data/

This regenerates features (if necessary) and then downloads the updated dataset.

Notes on input format for vision models

The input features for vision models are a 3D vector of shape (WIDTH, HEIGHT, CHANNELS), where the channel data is in RGB format and each pixel is scaled 0..1.

If the input to your model is different (e.g. BGR, or scaled 0..255) you'll need to transform the input. This needs to happen as part of your neural network, as the input will always be as stated above. Here's how you can do that:

If you have a model that requires the input to be scaled 0..255 (e.g. EfficientNet) you can inject a Mul layer that multiplies the input by 255 before passing it to the first hidden layer of your network.
- In Keras you do this by adding a Rescaling layer after training your model. Here's a Keras example using EfficientNet.
- For PyTorch you do this by first converting the trained model to ONNX, then injecting a Mul operator to the trained ONNX file. Example.
If you have a model that requires BGR input, rather than RGB input (e.g. Resnet50) you'll need to transpose the first and last channels.
- In Keras you do this by adding a lambda layer. Example using Resnet50.
- For PyTorch you do this by first converting the trained model to ONNX, then transposing using scc4onnx.
If you have a model that requires input to be scaled differently (e.g. Resnet50) you can typically do a matrix subtract or matrix multiplication layer. Here's an example in Keras for Resnet50.

An end-to-end example showing how to move and verify normalization code from a Python function to a neural network graph (using Resnet50 in Keras) can be found in this gist.

Note on required shape for image models (NCHW vs. NHWC)

Internally in Edge Impulse vision models require the input shape to be (n, Height, Width, Channels (NHWC). PyTorch uses (n, Channels, Height, Width) (NCHW) internally, and thus this needs to be converted when you train a model. We do this automatically when you output an ONNX file in NCHW format, but this is done by injecting a ton of Transpose layers (which lowers performance). If your training pipeline natively supports outputting TFLite / SavedModel files in NHWC format then please do that (f.e. Ultralytics YOLOv5 does this in their tf.py file).

Output format

The training pipeline can output either TFLite or ONNX files:

If you output TFLite files

model.tflite - a TFLite file with float32 inputs and outputs.
model_quantized_int8_io.tflite - a quantized TFLite file with int8 inputs and outputs.
saved_model.zip - a TensorFlow saved model (optional).

At least one of the TFLite files is required.

If you output ONNX files

model.onnx - An ONNX file with float16 or float32 inputs and outputs.

We automatically convert this file to both unquantized and quantized TFLite files after training.

I'm using scikit-learn, I don't have TFLite or ONNX files...

If you have a training pipeline that cannot output TFLite files by default (e.g. scikit-learn), you can use jax to implement the inference function; and compile that to TFLite. See our example repository. If there's any TFLite ops in your final model that are not supported by the EON Compiler (so you cannot run on device), then please let us know on the forums.

Hosting your custom block

Host your block directly within Edge Impulse with the Edge Impulse CLI:

$ edge-impulse-blocks init
$ edge-impulse-blocks push

To edit the block, go to:

Enterprise: go to your organization, Custom blocks > Machine learning.
Developers: click on your photo on the top right corner, select Custom blocks > Machine learning.

The block is now available from inside any of your Edge Impulse projects. Depending on the data your block operates on, you can add it via:

Object Detection: Create impulse > Add learning block > Object Detection (Images), then select the block via 'Choose a different model' on the 'Object detection' page.
Image classification: Create impulse > Add learning block > Transfer learning (Images), then select the block via 'Choose a different model' on the 'Transfer learning' page.
Audio classification: Create impulse > Add learning block > Transfer Learning (Keyword Spotting), then select the block via 'Choose a different model' on the 'Transfer learning' page.
Other (classification): Create impulse > Add learning block > Custom classification, then select the block via 'Choose a different model' on the 'Machine learning' page.
Other (regression): Create impulse > Add learning block > Custom regression, then select the block via 'Choose a different model' on the 'Regression' page.

Object detection output layers

Unfortunately object detection models typically don't have a standard way to go from neural network output layer to bounding boxes. Currently we support the following types of output layers:

MobileNet SSD
Edge Impulse FOMO
YOLOv5 (compatible with Ultralytics YOLOv5 v6)
YOLOv5 for Renesas DRP-AI
YOLOX

If you have an object detection model with a different output layer then please contact your user success engineer (enterprise) or let us know on the forums (free users) with an example on how to interpret the output, and we can add it.

Getting latency/memory information

When training locally you can use the profiling API to get latency, RAM and ROM estimates. This is very useful as you can immediately see whether your model will fit on device. Additionally, you can use this API as part your experiment tracking (f.e. in Weights & Biases or MLFlow) to wield out models that won't fit your latency or memory constraints.

The profiling API expects:

A TFLite file.
A reference device (for latency calculation) - you can get a list of all devices via getProjectInfo in the latencyDevices object.
A reference model (which model is closest to your architecture) - you can choose between gestures-large-f32, gestures-large-i8, image-32-32-mobilenet-f32, image-32-32-mobilenet-i8, image-96-96-mobilenet-f32, image-96-96-mobilenet-i8, image-320-320-mobilenet-ssd-f32, keywords-2d-f32, keywords-2d-i8. Make sure to use i8 models if you have quantized your model.

Here's how you invoke the API from Python:

import requests, json, time, base64

PROJECT_ID = 1 # YOUR PROJECT ID
API_KEY = "ei_..." # YOUR API KEY
DEVICE = 'infineon-cy8ckit-062s2' # reference device
REFERENCE_MODEL = 'keywords-2d-i8' # reference model

def profile_tflite_model(tflite_file_path):
    url = f"https://studio.edgeimpulse.com/v1/api/{PROJECT_ID}/jobs/profile-tflite"

    base64_encoded_file = ''
    with open(tflite_file_path, "rb") as f:
        base64_encoded_file = base64.b64encode(f.read()).decode('utf-8')

    payload = {
        'tfliteFileBase64': base64_encoded_file,
        'device': DEVICE,
        'referenceModel': REFERENCE_MODEL,
    }
    headers = {
        "x-api-key": API_KEY,
        "Accept": "application/json",
        "Content-Type": "application/json",
    }
    response = requests.request("POST", url, json=payload, headers=headers)
    body = json.loads(response.text)
    if (not body['success']):
        raise Exception(body['error'])
    return body['id']

def get_stdout(job_id, skip_line_no):
    url = f"https://studio.edgeimpulse.com/v1/api/{PROJECT_ID}/jobs/{job_id}/stdout"
    headers = {
        "x-api-key": API_KEY,
        "Accept": "application/json",
        "Content-Type": "application/json",
    }
    response = requests.request("GET", url, headers=headers)
    body = json.loads(response.text)
    if (not body['success']):
        raise Exception(body['error'])
    stdout = body['stdout'][::-1] # reverse array so it's old -> new
    return [ x['data'] for x in stdout[skip_line_no:] ]

def wait_for_job_completion(job_id):
    skip_line_no = 0

    url = f"https://studio.edgeimpulse.com/v1/api/{PROJECT_ID}/jobs/{job_id}/status"
    headers = {
        "x-api-key": API_KEY,
        "Accept": "application/json",
        "Content-Type": "application/json",
    }
    while True:
        response = requests.request("GET", url, headers=headers)
        body = json.loads(response.text)
        if (not body['success']):
            raise Exception(body['error'])

        stdout = get_stdout(job_id, skip_line_no)
        for l in stdout:
            print(l, end='')
        skip_line_no = skip_line_no + len(stdout)

        if (not 'finished' in body['job']):
            # print('Job', job_id, 'is not finished yet...', body['job'])
            time.sleep(1)
            continue
        if (not body['job']['finishedSuccessful']):
            raise Exception('Job failed')
        else:
            break

def get_perf_results(job_id):
    url = f"https://studio.edgeimpulse.com/v1/api/{PROJECT_ID}/jobs/profile-tflite/{job_id}/result"
    headers = {
        "x-api-key": API_KEY,
        "Accept": "application/json",
        "Content-Type": "application/json",
    }
    response = requests.request("POST", url, headers=headers)
    body = json.loads(response.text)
    if (not body['success']):
        raise Exception(body['error'])
    return body

if __name__ == "__main__":
    job_id = profile_tflite_model('model-tensorflow-lite-int8-quantized-model.lite')
    print('Job ID is', job_id)
    wait_for_job_completion(job_id)
    print('Job', job_id, 'is finished')
    perf_data = get_perf_results(job_id)
    print('Memory usage', perf_data['memory'])
    print('Time per inference (' + DEVICE + ')', perf_data['timePerInferenceMs'])

FOMO: Object detection for constrained devices

Tutorials

Want to see FOMO in action? Check out our tutorial.

For example, FOMO lets you do 60 fps object detection on a Raspberry Pi 4:

And here's FOMO doing 30 fps object detection on an Arduino Nicla Vision (Cortex-M7 MCU), using 245K RAM.

You can find the complete Edge Impulse project with the beers vs. cans model, including all data and configuration here: .

How does this 🪄 work?

Heat maps

16x16
4x4
1x1

Training on centroids

Flexible and very, very fast

Together this gives FOMO the capabilities to scale from the smallest microcontrollers all the way to full gateways or GPUs. Just some numbers:

The video on the top classifies 60 times / second on a stock Raspberry Pi 4 (160x160 grayscale input, MobileNetV2 0.1 alpha). This is 20x faster than MobileNet SSD which does ~3 frames/second.
The second video on the top classifies 30 times / second on an Arduino Nicla Vision board with a Cortex-M7 MCU running at 480MHz) in ~240K of RAM (96x96 grayscale input, MobileNetV2 0.35 alpha).
During Edge Impulse Imagine we demonstrated a FOMO model running on a doing 14 frames per second on a DSP (). This model ran in under 150KB of RAM (96x96 grayscale input, MobileNetV2 0.1 alpha). [1]
The smallest version of FOMO (96x96 grayscale input, MobileNetV2 0.05 alpha) runs in <100KB RAM and ~10 fps. on a Cortex-M4F at 80MHz. [1]

[1] Models compiled using .

How to get started?

To build your first FOMO models:

Create a new project in Edge Impulse.
Make sure to set your labeling method to 'Bounding boxes (object detection)'.
Collect and prepare your dataset as in
Add an 'Object Detection (Images)' block to your impulse.
Under Images, select 'Grayscale'
Under Object detection, select 'Choose a different model' and select one of the FOMO models.
Make sure to lower the learning rate to 0.001 to start.

FOMO is currently compatible with all that have a camera, and with Edge Impulse for Linux (any client). Of course, you can export your model as a C++ Library and integrate it as usual on any device or development board, the output format of models is compatible with normal object detection models; and our SDK runs on almost anything under the sun (see for an overview) from RTOS's to bare-metal to special accelerators and GPUs.

Expert mode tips

Additional configuration for FOMO can be accessed via expert mode.

Object weighting

MobileNet cut point

FOMO uses as a base model for its trunk and by default does a spatial reduction of 1/8th from input to output (e.g. a 96x96 input results in a 12x12 output). This is implemented by cutting MobileNet off at the intermediate layer block_6_expand_relu

FOMO classifier capacity

FOMO can be thought of logically as the first section of MobileNetV2 followed by a standard classifier where the classifier is applied in a fully convolutional fashion.

In the default configuration this FOMO classifier is equivalent to a single dense layer with 32 nodes followed by a classifier with num_classes outputs.

For a three way classifier, using the default cut point, the result is a classifier head with ~3200 parameters.

 LAYER                          SHAPE                NUMBER OF PARAMETERS
 block_6_expand_relu (ReLU)     (None, 20, 20, 96)   0                                         
 head (Conv2D)                  (None, 20, 20, 32)   3104                                            
 logits (Conv2D)                (None, 20, 20, 3)    99

We have the option of increasing the capacity of this classifier head by either 1) increasing the number of filters in the Conv2D layer, 2) adding additional layers or 3) doing both.

For example we might change the number of filters from 32 to 16, as well as adding another convolutional layer, as follows.

 LAYER                          SHAPE                NUMBER OF PARAMETERS
 block_6_expand_relu (ReLU)     (None, 20, 20, 96)   0
 head_1 (Conv2D)                (None, 20, 20, 16)   1552                                         
 head_2 (Conv2D)                (None, 20, 20, 16)   272                                          
 logits (Conv2D)                (None, 20, 20, 3)    51

Performance and Minimum Requirements

Making sure the model itself can fit into the target's memory (flash/RAM), and
making sure the target also has enough memory to hold the image buffer (flash/RAM)in addition to your application logic

Custom learning blocks

YOLOv5 - wraps the Ultralytics YOLOv5 repository (trained with PyTorch) to train a custom transfer learning model.
EfficientNet - a Keras implementation of transfer learning with EfficientNet B0.
Keras - a basic multi-layer perceptron in Keras and TensorFlow.
PyTorch - a basic multi-layer perceptron in PyTorch.
Scikit-learn - trains a logistic regression model using scikit-learn, then outputs a TFLite file for inferencing using jax.

Editing built-in blocks

Dockerfiles

A typical Dockerfile might look like (see the example repositories for more information):

# syntax = docker/dockerfile:experimental
FROM ubuntu:20.04
WORKDIR /app

ARG DEBIAN_FRONTEND=noninteractive

# Install base packages (like Python and pip)
RUN apt update && apt install -y curl zip git lsb-release software-properties-common apt-transport-https vim wget python3 python3-pip
RUN python3 -m pip install --upgrade pip==20.3.4

# Copy Python requirements in and install them
COPY requirements.txt ./
RUN pip3 install -r requirements.txt

# Copy the rest of your training scripts in
COPY . ./

# And tell us where to run the pipeline
ENTRYPOINT ["python3", "-u", "train.py"]

Important: ENTRYPOINT

It's important to create an ENTRYPOINT at the end of the Dockerfile to specify which file to run.

GPU Support

Arguments to your script

The entrypoint (see above in the Dockerfile) will be called with these four parameters:

--data-directory - where you can find the data (see below for the input/output formats).
--epochs - number of epochs to train for (set by the user in the UI).
--learning-rate - learning rate to train with (set by the user in the UI).
--out-directory - where to write the TFLite or ONNX files (see below for the input/output formats).

Input format

The data directory contains your dataset, after running any DSP blocks, and already split in a train/validation set:

X_split_train.npy
Y_split_train.npy
X_split_test.npy
Y_split_train.npy

The Y_*.npy files are either:

int32 Numpy arrays, with four columns (label_index, sample_id, sample_slice_start_ms, sample_slice_end_ms).
A JSON array in the form of: [{ "sampleId": 234731, "boundingBoxes": [{ "label": 1, "x": 260, "y": 313, "w": 234, "h": 261 }] } ]

2) is sent if your dataset has bounding boxes, in all other cases 1) is sent.

To get new data for your project, just run (requires Edge Impulse CLI v1.16 or higher):

edge-impulse-blocks runner --download-data data/

This regenerates features (if necessary) and then downloads the updated dataset.

Notes on input format for vision models

The input features for vision models are a 3D vector of shape (WIDTH, HEIGHT, CHANNELS), where the channel data is in RGB format and each pixel is scaled 0..1.

If you have a model that requires the input to be scaled 0..255 (e.g. EfficientNet) you can inject a Mul layer that multiplies the input by 255 before passing it to the first hidden layer of your network.
- In Keras you do this by adding a Rescaling layer after training your model. Here's a Keras example using EfficientNet.
- For PyTorch you do this by first converting the trained model to ONNX, then injecting a Mul operator to the trained ONNX file. Example.
If you have a model that requires BGR input, rather than RGB input (e.g. Resnet50) you'll need to transpose the first and last channels.
- In Keras you do this by adding a lambda layer. Example using Resnet50.
- For PyTorch you do this by first converting the trained model to ONNX, then transposing using scc4onnx.
If you have a model that requires input to be scaled differently (e.g. Resnet50) you can typically do a matrix subtract or matrix multiplication layer. Here's an example in Keras for Resnet50.

An end-to-end example showing how to move and verify normalization code from a Python function to a neural network graph (using Resnet50 in Keras) can be found in this gist.

Note on required shape for image models (NCHW vs. NHWC)

Output format

The training pipeline can output either TFLite or ONNX files:

If you output TFLite files

model.tflite - a TFLite file with float32 inputs and outputs.
model_quantized_int8_io.tflite - a quantized TFLite file with int8 inputs and outputs.
saved_model.zip - a TensorFlow saved model (optional).

At least one of the TFLite files is required.

If you output ONNX files

model.onnx - An ONNX file with float16 or float32 inputs and outputs.

We automatically convert this file to both unquantized and quantized TFLite files after training.

I'm using scikit-learn, I don't have TFLite or ONNX files...

Hosting your custom block

Host your block directly within Edge Impulse with the Edge Impulse CLI:

$ edge-impulse-blocks init
$ edge-impulse-blocks push

To edit the block, go to:

Enterprise: go to your organization, Custom blocks > Machine learning.
Developers: click on your photo on the top right corner, select Custom blocks > Machine learning.

The block is now available from inside any of your Edge Impulse projects. Depending on the data your block operates on, you can add it via:

Object Detection: Create impulse > Add learning block > Object Detection (Images), then select the block via 'Choose a different model' on the 'Object detection' page.
Image classification: Create impulse > Add learning block > Transfer learning (Images), then select the block via 'Choose a different model' on the 'Transfer learning' page.
Audio classification: Create impulse > Add learning block > Transfer Learning (Keyword Spotting), then select the block via 'Choose a different model' on the 'Transfer learning' page.
Other (classification): Create impulse > Add learning block > Custom classification, then select the block via 'Choose a different model' on the 'Machine learning' page.
Other (regression): Create impulse > Add learning block > Custom regression, then select the block via 'Choose a different model' on the 'Regression' page.

Object detection output layers

Unfortunately object detection models typically don't have a standard way to go from neural network output layer to bounding boxes. Currently we support the following types of output layers:

MobileNet SSD
Edge Impulse FOMO
YOLOv5 (compatible with Ultralytics YOLOv5 v6)
YOLOv5 for Renesas DRP-AI
YOLOX

Getting latency/memory information

The profiling API expects:

A TFLite file.
A reference device (for latency calculation) - you can get a list of all devices via getProjectInfo in the latencyDevices object.
A reference model (which model is closest to your architecture) - you can choose between gestures-large-f32, gestures-large-i8, image-32-32-mobilenet-f32, image-32-32-mobilenet-i8, image-96-96-mobilenet-f32, image-96-96-mobilenet-i8, image-320-320-mobilenet-ssd-f32, keywords-2d-f32, keywords-2d-i8. Make sure to use i8 models if you have quantized your model.

Here's how you invoke the API from Python:

import requests, json, time, base64

PROJECT_ID = 1 # YOUR PROJECT ID
API_KEY = "ei_..." # YOUR API KEY
DEVICE = 'infineon-cy8ckit-062s2' # reference device
REFERENCE_MODEL = 'keywords-2d-i8' # reference model

def profile_tflite_model(tflite_file_path):
    url = f"https://studio.edgeimpulse.com/v1/api/{PROJECT_ID}/jobs/profile-tflite"

    base64_encoded_file = ''
    with open(tflite_file_path, "rb") as f:
        base64_encoded_file = base64.b64encode(f.read()).decode('utf-8')

    payload = {
        'tfliteFileBase64': base64_encoded_file,
        'device': DEVICE,
        'referenceModel': REFERENCE_MODEL,
    }
    headers = {
        "x-api-key": API_KEY,
        "Accept": "application/json",
        "Content-Type": "application/json",
    }
    response = requests.request("POST", url, json=payload, headers=headers)
    body = json.loads(response.text)
    if (not body['success']):
        raise Exception(body['error'])
    return body['id']

def get_stdout(job_id, skip_line_no):
    url = f"https://studio.edgeimpulse.com/v1/api/{PROJECT_ID}/jobs/{job_id}/stdout"
    headers = {
        "x-api-key": API_KEY,
        "Accept": "application/json",
        "Content-Type": "application/json",
    }
    response = requests.request("GET", url, headers=headers)
    body = json.loads(response.text)
    if (not body['success']):
        raise Exception(body['error'])
    stdout = body['stdout'][::-1] # reverse array so it's old -> new
    return [ x['data'] for x in stdout[skip_line_no:] ]

def wait_for_job_completion(job_id):
    skip_line_no = 0

    url = f"https://studio.edgeimpulse.com/v1/api/{PROJECT_ID}/jobs/{job_id}/status"
    headers = {
        "x-api-key": API_KEY,
        "Accept": "application/json",
        "Content-Type": "application/json",
    }
    while True:
        response = requests.request("GET", url, headers=headers)
        body = json.loads(response.text)
        if (not body['success']):
            raise Exception(body['error'])

        stdout = get_stdout(job_id, skip_line_no)
        for l in stdout:
            print(l, end='')
        skip_line_no = skip_line_no + len(stdout)

        if (not 'finished' in body['job']):
            # print('Job', job_id, 'is not finished yet...', body['job'])
            time.sleep(1)
            continue
        if (not body['job']['finishedSuccessful']):
            raise Exception('Job failed')
        else:
            break

def get_perf_results(job_id):
    url = f"https://studio.edgeimpulse.com/v1/api/{PROJECT_ID}/jobs/profile-tflite/{job_id}/result"
    headers = {
        "x-api-key": API_KEY,
        "Accept": "application/json",
        "Content-Type": "application/json",
    }
    response = requests.request("POST", url, headers=headers)
    body = json.loads(response.text)
    if (not body['success']):
        raise Exception(body['error'])
    return body

if __name__ == "__main__":
    job_id = profile_tflite_model('model-tensorflow-lite-int8-quantized-model.lite')
    print('Job ID is', job_id)
    wait_for_job_completion(job_id)
    print('Job', job_id, 'is finished')
    perf_data = get_perf_results(job_id)
    print('Memory usage', perf_data['memory'])
    print('Time per inference (' + DEVICE + ')', perf_data['timePerInferenceMs'])