Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Training visual anomaly detection models involves developing algorithms to identify unusual patterns or anomalies in image data that do not conform to the expected behavior. These models are crucial in various applications, including industrial inspection, medical imaging, and logistics.
For visual anomaly detection use cases, i.e. handling defect identification in computer vision applications, Edge Impulse provides the "FOMO-AD" learning block, or "Faster Objects, More Objects - Anomaly Detection" - based on the GMM anomaly detection algorithm and FOMO for fast object detection deployment on resource-constrained devices like microcontrollers.
Neural networks are powerful but have a major drawback: handling unseen data, like defects in a product during manufacturing, is a challenge due to their reliance on existing training data. Even entirely novel inputs often get misclassified into existing categories. Gaussian Mixture Models (GMMs) are clustering techniques that we can use for anomaly detection.
Only available with Edge Impulse Enterprise Plan
Try our FREE Enterprise Trial today.
Gaussian Mixture Model (GMM)
A Gaussian Mixture Model represents a probability distribution as a mixture of multiple Gaussian (normal) distributions. Each Gaussian component in the mixture represents a cluster of data points with similar characteristics. Thus, GMMs work using the assumption that the samples within a dataset can be modeled using different Gaussian distributions.
Anomaly detection using GMM involves identifying data points with low probabilities. If a data point has a significantly lower probability of being generated by the mixture model compared to most other data points, it is considered an anomaly; this will output a high anomaly score.
GMM has some overlap with K-means, however, K-means clusters are always circular, spherical or hyperspherical when GMM can model elliptical clusters.
Looking for another anomaly detection technique? Or are you using time-based frequency sensor data? See Anomaly detection (GMM) or Anomaly detection (K-Means)
The FOMO-AD learning block has one adjustable parameter: capacity. The neural network architecture is also adjustable.
Regardless of what resolution we intend to use for raw image input, we empirically get the best result for anomaly detection from using 96x96
ImageNet weights. We use 96x96 weights since we'll only being used the start of MobileNet to reduce to 1/8th input.
The higher the capacity, the higher the number of (Gaussian) components, and the more adapted the model becomes to the original distribution.
Click on Start training to trigger the learning process. Once trained you will obtain a trained model view that looks like the following:
Continue to the Model testing tab to see the performance results of your model.
Note: By definition, there should not be any anomalies in the training dataset, and thus accuracy is not calculated during training. Run Model testing to learn more about the model performance and to view per region anomalous scoring results.
Navigate to the Model testing page and click on Classify all:
Limitation
Make sure to label your samples exactly as anomaly
or no anomaly
in your test dataset so they can be used in the F1 score calculation. We are working on making this more flexible.
In the example above, you will see that some samples have regions that are considered as no anomaly
while the expected output is an anomaly
. To adjust this prediction, you can set the Confidence thresholds, where you can also see the default or suggested value: "Suggested value is 16.6, based on the top anomaly scores in the training dataset.":
In this project, we have set the confidence threshold to 6
. This gives results closer to our expectations:
Cells with white borders are the ones that passed as anomalous, given the confidence threshold of the learning block.
All cells are assigned a cell background color based on the anomaly score, going from blue to red, with an increasing opaqueness.
Hover over the cells to see the scores.
The grid size is calculated as (inputWidth / 8) / 2 - 1
Keep in mind that every project is different, and will thus use different suggested confidence thresholds depending on the input training data, please make sure to also validate your results in real conditions. The suggested threshold is np.max(scores)
where scores are the scores of the training dataset.
During training, X
number of Gaussian probability distributions are learned from the data where X
is the number of components (or clusters) defined in the learning block page. Samples are assigned to one of the distributions based on the probability that it belongs to each. We use Sklearn under the hood and the anomaly score corresponds to the log-likelihood
.
For the inference, we calculate the probability (which can be interpreted as a distance on a graph) for a new data point belonging to one of the populations in the training data. If the data point belongs to a cluster, the anomaly score will be low.
Interesting readings:
Python Data Science Handbook - Gaussian Mixtures
scikit-learn.org - Gaussian Mixture models
Public projects:
If you have selected the Classification learning block in the Create impulse page, a NN Classifier page will show up in the menu on the left. This page becomes available after you've extracted your features from your DSP block.
Tutorials
Want to see the Classification block in action? Check out our tutorials:
.
The basic idea is that a neural network classifier will take some input data, and output a probability score that indicates how likely it is that the input data belongs to a particular class.
So how does a neural network know what to predict? The neural network consists of several layers, each of which is made up of a number of neurons. The neurons in the first layer are connected to the neurons in the second layer, and so on. The weight of a connection between two neurons in a layer is randomly determined at the beginning of the training process. The neural network is then given a set of training data, which is a set of examples that it is supposed to predict. The network's output is compared to the correct answer and, based on the results, the weights of the connections between the neurons in the layer are adjusted. This process is repeated a number of times, until the network has learned to predict the correct answer for the training data.
A particular arrangement of layers is referred to as an architecture, and different architectures are useful for different tasks. This way, after a lot of iterations, the neural network learns; and will eventually become much better at predicting new data.
On this page, you can configure the model and the training process and, have an overview of your model performances.
This panel displays the output logs during the training. The previous training logs can also be retrieved from the Jobs tab in the Dashboard page (enterprise feature).
This section gives an overview of your model performances and helps you evaluate your model. It can help you determine if the model is capable of meeting your needs or if you need to test other hyper parameters and architectures.
From the Last training performances you can retrieve your validation accuracy and loss.
The Confusion matrix is one of most useful tool to evaluate a model. it tabulates all of the correct and incorrect responses a model produces given a set of data. The labels on the side correspond to the actual labels in each sample, and the labels on the top correspond to the predicted labels from the model.
The features explorer, like in the processing block views, indicated the spatial distribution of your input features. In this page, you can visualize which ones have been correctly classified and which ones have not.
On-device performance: Based on the target you chose in the Dashboard page, we will output estimations for the inferencing time, peak RAM usage and flash usage. This will help you validate that your model will be able to run on your device based on its constraints.
See on the Learning Block page.
See on the Learning Block page.
See on the Learning Block page.
Neural networks are great, but they have one big flaw. They're terrible at dealing with data they have never seen before (like a new gesture). Neural networks cannot judge this, as they are only aware of the training data. If you give it something unlike anything it has seen before it'll still classify as one of the four classes.
Tutorial
Want to see the Anomaly Detection (K-means) block in action? Check out our Continuous Motion Recognition tutorial.
K-means clustering
This method looks at the data points in a dataset and groups those that are similar into a predefined number K of clusters. A threshold value can be added to detect anomalies: if the distance between a data point and its nearest centroid is greater than the threshold value, then it is an anomaly.
The main difficulty resides in choosing K, since data in a time series is always changing and different values of K might be ideal at different times. Besides, in more complex scenarios where there are both local and global outliers, many outliers might pass under the radar and be assigned to a cluster.
Looking for another anomaly detection technique? See Anomaly detection (GMM)
K-Means has some overlap with GMM. However, GMMs work using the assumption that the samples within a dataset can be modeled using different Gaussian distributions. If this is not the case for your dataset, K-Means will likely be a better option for you.
In most of our DSP blocks, you have the option to calculate the feature importance. Edge Impulse Studio will then output a Feature Importance list that will help you determine which axes generated from your DSP block are most significant to analyze when you want to do anomaly detection.
See Processing blocks > Feature importance
The K-Means anomaly detection learning block has two adjustable parameters: the Cluster count and The axes
Cluster count: the K
clusters.
Axes: The different axes correspond to the generated features from the pre-processing block. The chosen axes will use the features as the input data for the training.
Click on the Select suggested axes button to harness the results of the feature importance output.
Click on Start training to trigger the learning process. Once trained you will obtain a view that looks like the following:
Note: By definition, there should not be any anomalies in the training dataset, and thus accuracy is not calculated during training. Run Model testing to learn more about the model performance. Additionally, you can also select a test data sample in the Anomaly Explorer directly on this page.
In the above picture, known clusters are in blue, new classified data are in orange. It's clearly outside of any known clusters and can thus be tagged as an anomaly.
Here is the process in the background:
Create X number of clusters and group all the data.
For each of these clusters, we store the center and the size of the cluster.
During the inference, we calculate the closest cluster for a new data point and show the distance from the edge of the cluster. If it’s within a cluster (no anomaly) you thus get a value below 0.
Tutorial: Continuous Motion Recognition
Neural networks are powerful but have a major drawback: handling unseen data, like new gestures, is a challenge due to their reliance on existing training data. Even entirely novel inputs often get misclassified into existing categories. Gaussian Mixture Models (GMMs) are clustering techniques that we can use for anomaly detection. GMMs can perform well with datasets that would otherwise perform poorly with other anomaly detection algorithms (like K-means).
Only available with Edge Impulse Professional and Enterprise Plans
Try our Professional Plan or FREE Enterprise Trial today.
Gaussian Mixture Model (GMM)
A Gaussian Mixture Model represents a probability distribution as a mixture of multiple Gaussian (normal) distributions. Each Gaussian component in the mixture represents a cluster of data points with similar characteristics. Thus, GMMs work using the assumption that the samples within a dataset can be modeled using different Gaussian distributions.
Anomaly detection using GMM involves identifying data points with low probabilities. If a data point has a significantly lower probability of being generated by the mixture model compared to most other data points, it is considered an anomaly (this will output of a high anomaly score).
Looking for another anomaly detection technique? See Anomaly detection (K-Means)
GMM has some overlap with K-means, however, K-means clusters are always circular, spherical or hyperspherical when GMM can model elliptical clusters.
In most of our DSP blocks, you have the option to calculate the feature importance. Edge Impulse Studio will then output a Feature Importance list that will help you determine which axes generated from your DSP block are most significant to analyze when you want to do anomaly detection.
See Processing blocks > Feature importance
The GMM anomaly detection learning block has two adjustable parameters: the Number of components and The axes.
The number of (gaussian) components can be interpreted as the number of clusters in Gaussian Mixture Models.
How to choose the number of components?
When increasing the number of (Gaussian) components, the model will fit the original distribution more closely. If the value is too high, there is a risk of overfitting.
If you have prior knowledge about the problem or the data, it can provide valuable insights into the appropriate number of components. For example, if you know that there are three distinct groups in your data, you may start by trying a GMM with three components. Visualizing the data can also provide hints about the number of clusters. If you can distinguish several visible clusters from your training dataset, try to set the number of components as the number of visible clusters
The different axes correspond to the generated features from the pre-processing block. The chosen axes will use the features as the input data for the training.
Click on the Select suggested axes button to harness the results of the feature importance output.
Click on Start training to trigger the learning process. Once trained you will obtain a view that looks like the following:
Note: By definition, there should not be any anomalies in the training dataset, and thus accuracy is not calculated during training. Run Model testing to learn more about the model performance. Additionally, you can also select a test data sample in the Anomaly Explorer directly on this page.
Navigate to the Model testing page and click on Classify all:
Limitation
Make sure to label your samples exactly as anomaly
or no anomaly
in your test dataset so they can be used in the F1 score calculation. We are working on making this more flexible.
In the example above, you will see that some samples are considered as no anomaly
while the expected output is an anomaly
. If you take a closer look at the anomaly score for non anomaly
samples, the range values are below 1.00
:
To fix this, you can set the Confidence thresholds
In this project, we have set the confidence threshold to 1.00
. This gives results closer to our expectations:
Keep in mind that every project is different, please make sure to also validate your results in real conditions.
During training, X number of Gaussian probability distributions are learned from the data where X is the number of components (or clusters) defined in the learning block page. Samples are assigned to one of the distributions based on the probability that it belongs to each. We use Sklearn under the hood and the anomaly score corresponds to the log-likelihood
.
For the inference, we calculate the probability (which can be interpreted as a distance on a graph) for a new data point belonging to one of the populations in the training data. If the data point belongs to a cluster, the anomaly score will be low.
Interesting readings:
Python Data Science Handbook - Gaussian Mixtures
scikit-learn.org - Gaussian Mixture models
Public Projects:
Solving regression problems is one of the most common applications for machine learning models, especially in supervised machine learning. Models are trained to understand the relationship between independent variables and an outcome or dependent variable. The model can then be leveraged to predict the outcome of new and unseen input data, or to fill a gap in missing data.
To build a regression model you collect data as usual, but rather than setting the label to a text value, you set it to a numeric value.
You can use any of the built-in signal processing blocks to pre-process your vibration, audio or image data, or use custom processing blocks to extract novel features from other types of sensor data.
You have full freedom in modifying your neural network architecture - whether visually or through writing Keras code.
See Neural Network Settings on the Learning Block page.
See Neural Network Architecture on the Learning Block page.
See Expert mode on the Learning Block page.
If you want to see the accuracy of your model across your test dataset, go to Model testing. You can adjust the Maximum error percentage by clicking on the "⋮" button.
After extracting meaningful features from the raw signal using signal processing, you can now train your model using a learning block. We provide several pre-defined learning blocks:
Miss an architecture? You can create a custom learning block, with PyTorch, Keras or scikit-learn to bring your custom architecture and train it with Edge Impulse. If you already have a trained model, you can also Bring Your Own Model (BYOM) to directly profile it and deploy it on your edge devices.
If you are familiar with TensorFlow and Keras, in most blocks, you can use the Switch to Expert mode button to access the full Keras API for custom architectures, rebalance your weights, change the optimizer, and more.
For most of the learning blocks provided by Edge Impulse (Keras and Transfer Learning-based blocks), a view similar to the one below is available. See the dedicated learning block page for specific details when it differs.
Number of training cycles: Each time the training algorithm makes one complete pass through all of the training data with back-propagation and updates the model's parameters as it goes, it is known as an epoch or training cycle.
Use Learned Optimizer (VeLO): Use a neural network as an optimizer to calculate gradients and learning rates. For optimal results with VeLO, it is recommended to use as large a batch size as possible, potentially equal to the dataset's size. See Learned Optimizer (VeLO)
Learning rate: The learning rate controls how much the model's internal parameters are updated during each step of the training process. Or, you can also see it as how fast the neural network will learn. If the network overfits quickly, you can reduce the learning rate.
Validation set size: The percentage of your training set held apart for validation, a good default is 20%.
Split train/validation set on metadata key: Prevent group data leakage between train and validation datasets using sample metadata. Given a metadata key, samples with the same value for that key will always be on the same side of the validation split. Leave empty to disable. Also, see metadata.
Batch size: The batch size used during training. If not set, we'll use the default value. Training may fail if the batch size is too high.
Auto-weight classes While training, pay more attention to samples from under-represented classes. Might help make the model more robust against overfitting if you have little data for some classes.
Profile int8 model: Profiling the quantized model might take a long time on large datasets. Disable this option to skip profiling.
For classification and regression tasks, you can edit the layers directly from the web interface. Depending on your project type, we may offer to choose between different architecture presets to help you get started.
The neural network architecture takes as inputs your extracted features, and pass the features to each layer of your architecture. In the classification case, the last used layer is a softmax layer. It is this last layer that gives the probability of belonging to one of the classes.
From the visual (simple) mode, you can add the following layers:
For Transfer Learning tasks like Audio or Image Transfer learning and Object Detection, you can select which pre-trained model is more suited for your use case and edit the last layers parameters to be trained:
If have advanced knowledge in machine learning and Keras, you can switch to the Expert Mode and access the full Keras API to use custom architectures:
You can use the expert mode to change your loss function, optimizer, print your model architecture and even set an early stopping callback to prevent overfitting your model.
When creating an impulse to solve an image classification problem, you will most likely want to use transfer learning. This is particularly true when working with a relatively small dataset.
Transfer learning is the process of taking features learned from one problem and leveraging it on a new but related problem. Most of the time these features are learned from large scale datasets with common objects hence making it faster & more accurate to tune and adapt to new tasks.
To choose transfer learning as your learning block, go to create impulse and click on Add a Learning Block, and select Transfer Learning.
To choose your preferred pre-trained network, go to Transfer learning on the left side of your screen and click choose a different model. A pop up will appear on your screen with a list of models to choose from as shown in the image below.
Edge Impulse uses state of the art MobileNetV1 & V2 architectures trained on an ImageNet dataset as it's pre-trained network for you to fine-tune for your specific application. The pre-trained networks comes with varying input blocks ranging from 96x96 to 320x320 and both RGB & Grayscale images for you to choose from depending on your application & target deployment hardware.
The preset configurations just don't work for your model? No worries, Expert Mode is for you! Expert Mode gives you full control of your model so that you can configure it however you want. To enable the expert mode, just click on the "⋮" button and toggle the expert mode.
You can use the expert mode to change your loss function, optimizer, print your model architecture and even set an early stopping callback to prevent overfitting your model.
Note: For Enterprise projects, Edge Impulse integrates with to utilize transfer learning with state-of-the-art pre-trained models from NVIDIA.
See on the Learning Block page.
See on the Learning Block page.
Transfer learning is the process of taking features learned from one problem and leveraging it on a new but related problem. Most of the time these features are learned from large scale datasets with common objects hence making it faster & more accurate to tune and adapt to new tasks. With Edge Impulse's transfer learning block for audio keyword spotting, we take the same transfer learning technique classically used for image classification and apply it to audio data. This allows you to fine-tune a pre-trained keyword spotting model on your data and achieve even better performance than using a classification block, even with a relatively small keyword dataset.
Excited? Train your first keyword spotting model in under 5 minutes with the getting started wizard!
To choose transfer learning as your learning block, go to create impulse and click on Add a Learning Block, and select Transfer Learning (Keyword Spotting).
To choose your preferred pre-trained network, select the Transfer learning tab on the left side of your screen and click choose a different model. A pop up will appear on your screen with a list of models to choose from as shown in the image below.
Edge Impulse uses state of the art MobileNetV1 & V2 architectures trained on an ImageNet dataset as it's pre-trained network for you to fine-tune for your specific application.
Before you start training your model, you need to set the following neural network configurations:
Number of training cycles: Each time the training algorithm makes one complete pass through all of the training data with back-propagation and updates the model's parameters as it goes, it is known as an epoch or training cycle.
Learning rate: The learning rate controls how much the models internal parameters are updated during each step of the training process. Or you can also see it as how fast the neural network will learn. If the network overfits quickly, you can reduce the learning rate
Validation set size: The percentage of your training set held apart for validation, a good default is 20%.
You might also need to enable auto balance to prevent model bias or even enable data augmentation to increase the size of your dataset and have more diverse dataset to prevent overfitting.
The preset configurations just don't work for your model? No worries, Expert Mode is for you! Expert Mode gives you full control of your model so that you can configure it however you want. To enable the expert mode, just click on the "⋮" button and toggle the expert mode.
You can use the expert mode to change your loss function, optimizer, print your model architecture and even set an early stopping callback to prevent overfitting your model.
The NVIDIA TAO Toolkit built on TensorFlow and PyTorch, uses the power of transfer learning while simultaneously simplifying the model training process and optimizing the model for inference throughput on the target platform. The result is an ultra-streamlined workflow. Take your own models or pre-trained models, adapt them to your own real or synthetic data, then optimize for inference throughput. All without needing AI expertise or large training datasets.
Edge Impulse offers the following learning blocks for NVIDIA TAO object detection and image classification tasks: RetinaNet, YOLOv3, YOLOv4, SSD, and image classification.
Only available with Edge Impulse Professional and Enterprise Plans
Try our Professional Plan or FREE Enterprise Trial today.
To build your first object detection models using MobileNetV2 SSD FPN-Lite:
Create a new project in Edge Impulse.
Make sure to set your labelling method to 'Bounding boxes (object detection)' or 'One label per data item (image classification)'.
Collect and prepare your dataset as in object detection or image classification.
Resize your images.
Add an 'NVIDIA TAO ...' block to your impulse.
Under NVIDIA TAO..., select between various parameters, in total there are 88 object detection architectures, and 15 image classification architectures.
There are pre-trained 3x224x224 backbones from the NVIDIA TAO catalog, and others trained by Edge Impulse on ImageNet.
For image classification, pre-trained weights only support 224x224 image resolution. Image width and height must be greater than 32.
Click on 'Start training'
With everything setup you can now build your machine learning model with these tutorials:
It's very hard to build a computer vision model from scratch, as you need a wide variety of input data to make the model generalize well, and training such models can take days on a GPU. To make building your model easier and faster we are using transfer learning. This lets you piggyback on a well-trained model, only re-training the upper layers of a neural network, leading to much more reliable models that train in a fraction of the time and work with substantially smaller datasets.
Tutorial
Want to see MobileNetV2 SSD FPN-Lite models in action? Check out our Detect objects with bounding boxes tutorial.
To build your first object detection models using MobileNetV2 SSD FPN-Lite:
Create a new project in Edge Impulse.
Make sure to set your labelling method to 'Bounding boxes (object detection)'.
Collect and prepare your dataset as in object detection.
Resize your image to fit 320x320px
Add an 'Object Detection (Images)' block to your impulse.
Under Images, choose RGB.
Under Object detection, select 'Choose a different model' and select 'MobileNetV2 SSD FPN-Lite 320x320'
You can start your training with a learning rate of '0.15'
Click on 'Start training'
MobileNetV2 SSD FPN-Lite 320x320 is available with Edge Impulse for Linux
Here, we are using the MobileNetV2 SSD FPN-Lite 320x320 pre-trained model. The model has been trained on the COCO 2017 dataset with images scaled to 320x320 resolution.
In the MobileNetV2 SSD FPN-Lite, we have a base network (MobileNetV2), a detection network (Single Shot Detector or SSD) and a feature extractor (FPN-Lite).
Base network:
MobileNet, like VGG-Net, LeNet, AlexNet, and all others, are based on neural networks. The base network provides high-level features for classification or detection. If you use a fully connected layer and a softmax layer at the end of these networks, you have a classification.
But you can remove the fully connected and the softmax layers, and replace it with detection networks, like SSD, Faster R-CNN, and others to perform object detection.
Detection network:
The most common detection networks are SSD (Single Shot Detection) and RPN (Regional Proposal Network).
When using SSD, we only need to take one single shot to detect multiple objects within the image. On the other hand, regional proposal networks (RPN) based approaches, such as R-CNN series, need two shots, one for generating region proposals, one for detecting the object of each proposal.
As a consequence, SSD is much faster compared with RPN-based approaches but often trades accuracy with real-time processing speed. They also tend to have issues in detecting objects that are too close or too small.
Feature Pyramid Network:
Detecting objects in different scales is challenging in particular for small objects. Feature Pyramid Network (FPN) is a feature extractor designed with feature pyramid concept to improve accuracy and speed.
The two most common image-processing problems are image classification and object detection.
Image classification takes an image as an input and outputs what type of object is in the image. This technique works great, even on microcontrollers, as long as we only need to detect a single object in the image.
On the other hand, object detection takes an image and outputs information about the class and number of objects, position, (and, eventually, size) in the image.
Edge Impulse provides four different methods to perform object detection:
Using MobileNetV2 SSD FPN
Using FOMO
Using NVIDIA TAO
Specifications | MobileNetV2 SSD FPN | FOMO | NVIDIA TAO | YOLOv5 |
---|---|---|---|---|
Edge Impulse FOMO (Faster Objects, More Objects) is a novel machine learning algorithm that brings object detection to highly constrained devices which lets you count multiple objects and find their location in an image in real-time using up to 30x less processing power and memory than MobileNet SSD or YOLOv5.
Tutorials
Want to see FOMO in action? Check out our Detect objects with centroids (FOMO) tutorial.
For example, FOMO lets you do 60 fps object detection on a Raspberry Pi 4:
And here's FOMO doing 30 fps object detection on an Arduino Nicla Vision (Cortex-M7 MCU), using 245K RAM.
You can find the complete Edge Impulse project with the beers vs. cans model, including all data and configuration here: https://studio.edgeimpulse.com/public/89078/latest.
So how does that work? First, a small primer. Let's say you want to detect whether you see a face in front of your sensor. You can approach this in two ways. You can train a simple binary classifier, which says either "face" or "no face", or you can train a complex object detection model which tells you "I see a face at this x, y point and of this size". Object detection is thus great when you need to know the exact location of something, or if you want to count multiple things (the simple classifier cannot do that) - but it's computationally much more intensive, and you typically need much more data for it.
The design goal for FOMO was to get the best of both worlds: the computational power required for simple image classification, but with the additional information on location and object count that object detection gives us.
The first thing to realize is that while the output of the image classifier is "face" / "no face" (and thus no locality is preserved in the outcome) the underlying neural network architecture consists of a number of convolutional layers. A way to think about these layers is that every layer creates a diffused lower-resolution image of the previous layer. E.g. if you have a 16x16 image the width/height of the layers may be:
16x16
4x4
1x1
Each 'pixel' in the second layer maps roughly to a 4x4 block of pixels in the input layer, and the interesting part is that locality is somewhat preserved. The 'pixel' in layer 2 at (0, 0) will roughly map back to the top left corner of the input image. The deeper you go in a normal image classification network, the less of this locality (or "receptive field") is preserved until you finally have just 1 outcome.
FOMO uses the same architecture, but cuts off the last layers of a standard image classification model and replaces this layer with a per-region class probability map (e.g. a 4x4 map in the example above). It then has a custom loss function which forces the network to fully preserve the locality in the final layer. This essentially gives you a heatmap of where the objects are.
The resolution of the heat map is determined by where you cut off the layers of the network. For the FOMO model trained above (on the beer bottles) we do this when the size of the heat map is 8x smaller than the input image (input image of 160x160 will yield a 20x20 heat map), but this is configurable. When you set this to 1:1 this actually gives you pixel-level segmentation and the ability to count a lot of small objects.
A difference between FOMO and other object detection algorithms is that it does not output bounding boxes, but it's easy to go from heat map to bounding boxes. Just draw a box around a highlighted area.
However, when working with early customers we realized that bounding boxes are merely an implementation detail of other object detection networks, and are not a typical requirement. Very often the size of objects is not important as cameras are in fixed locations (and objects thus fixed size), but rather you just want the location and the count of objects.
Thus, we now train on the centroids of objects. This makes it much easier to count objects that are close (every activation in the heat map is an object), and the convolutional nature of the neural network ensures we look around the centroid for the object anyway.
A downside of the heat map is that each cell acts as its own classifier. E.g. if your classes are "lamp", "plant" and "background" each cell will be either lamp, plant, or background. It's thus not possible to detect objects with overlapping centroids. You can see this in the Raspberry Pi 4 video above at 00:18 where the beer bottles are too close together. This can be solved by using a higher resolution heat map.
A really cool benefit of FOMO is that it's fully convolutional. If you set an image:heat map factor of 8 you can throw in a 96x96 image (outputs 12x12 heat map), a 320x320 image (outputs 40x40 heat map), or even a 1024x1024 image (outputs 128x128 heat map). This makes FOMO incredibly flexible, and useful even if you have very large images that need to be analyzed (e.g. in fault detection where the faults might be very, very small). You can even train on smaller patches, and then scale up during inference.
Additionally FOMO is compatible with any MobileNetV2 model. Depending on where the model needs to run you can pick a model with a higher or lower alpha, and transfer learning also works (although you need to train your base models specifically with FOMO in mind). This makes it easy for end customers to use their existing models and fine-tune them with FOMO to also add locality (e.g. we have customers with large transfer learning models for wildlife detection).
Together this gives FOMO the capabilities to scale from the smallest microcontrollers all the way to full gateways or GPUs. Just some numbers:
The video on the top classifies 60 times / second on a stock Raspberry Pi 4 (160x160 grayscale input, MobileNetV2 0.1 alpha). This is 20x faster than MobileNet SSD which does ~3 frames/second.
The second video on the top classifies 30 times / second on an Arduino Nicla Vision board with a Cortex-M7 MCU running at 480MHz) in ~240K of RAM (96x96 grayscale input, MobileNetV2 0.35 alpha).
During Edge Impulse Imagine we demonstrated a FOMO model running on a Himax WE-I Plus doing 14 frames per second on a DSP (video). This model ran in under 150KB of RAM (96x96 grayscale input, MobileNetV2 0.1 alpha). [1]
The smallest version of FOMO (96x96 grayscale input, MobileNetV2 0.1 alpha) runs in <100KB RAM and ~10 fps. on a Cortex-M4F at 80MHz. [1]
[1] Models compiled using EON Compiler.
To build your first FOMO models:
Create a new project in Edge Impulse.
Make sure to set your labeling method to 'Bounding boxes (object detection)'.
Collect and prepare your dataset as in object detection.
Add an 'Object Detection (Images)' block to your impulse.
Under Images, select 'Grayscale'
Under Object detection, select 'Choose a different model' and select one of the FOMO models.
Make sure to lower the learning rate to 0.001 to start.
FOMO is currently compatible with all fully-supported development boards that have a camera, and with Edge Impulse for Linux (any client). Of course, you can export your model as a C++ Library and integrate it as usual on any device or development board, the output format of models is compatible with normal object detection models; and our SDK runs on almost anything under the sun (see Running your impulse locally for an overview) from RTOS's to bare-metal to special accelerators and GPUs.
Additional configuration for FOMO can be accessed via expert mode.
FOMO is sensitive to the ratio of objects to background cells in the labelled data. By default the configuration is to weight object output cells x100 in the loss function, object_weight=100
, as a way of balancing what is usually a majority of background. This value was chosen as a sweet spot for a number of example use cases. In scenarios where the objects to detect are relatively rare this value can be increased, e.g. to 1000, to have the model focus even more on object detection (at the expense of potentially more false detections).
FOMO uses MobileNetV2 as a base model for its trunk and by default does a spatial reduction of 1/8th from input to output (e.g. a 96x96
input results in a 12x12
output). This is implemented by cutting MobileNet off at the intermediate layer block_6_expand_relu
Choosing a different cut_point
results in a different spatial reduction; e.g. if we cut higher at block_3_expand_relu
FOMO will instead only do a spatial reduction of 1/4 (i.e. a 96x96
input results in a 24x24
output)
Note though; this means taking much less of the MobileNet backbone and results in a model with only 1/2 the params. Switching to a higher alpha may counteract this parameter reduction. Later FOMO releases will counter this parameter reduction with a UNet style architecture.
FOMO can be thought of logically as the first section of MobileNetV2 followed by a standard classifier where the classifier is applied in a fully convolutional fashion.
In the default configuration this FOMO classifier is equivalent to a single dense layer with 32 nodes followed by a classifier with num_classes
outputs.
For a three way classifier, using the default cut point, the result is a classifier head with ~3200 parameters.
We have the option of increasing the capacity of this classifier head by either 1) increasing the number of filters in the Conv2D
layer, 2) adding additional layers or 3) doing both.
For example we might change the number of filters from 32 to 16, as well as adding another convolutional layer, as follows.
For some problems an additional layer can improve performance, and in this case actually uses less parameters. It can though potentially take longer to train and require more data. In future releases the tuning of this aspect of FOMO can be handled by the EON Tuner.
Just like the rest of our Neural Network-based learning blocks, FOMO is delivered as a set of basic math routines free of runtime dependencies. This means that there are virtually no limitations to running FOMO, other than:
Making sure the model itself can fit into the target's memory (flash/RAM), and
making sure the target also has enough memory to hold the image buffer (flash/RAM)in addition to your application logic
In all, we have seen buffer, model and app logic (including wireless stack) fit in as little as 200KB for 64x64 pixel images. But we would definitely recommend a target with at least 512KB so that you can take advantage of larger image sizes and a wider range of model optimizations.
With regards to latency, the speed of the target will determine the maximum number of frames that can be processed in a given interval (fps). This will of course be influenced by any other tasks the CPU may need to complete, but we have consistently seen MCUs running @ 80MHz complete a full pass on a 64x64 pixel image in under one second, which should translate to just under 1fps once you add the rest of your app logic. Keep in mind that frame throughput can increase dramatically at higher speeds or when tensor acceleration is available. We have measured 40-60 fps consistently on a Raspberry Pi 4 and ~15 fps on unaccelerated 480MHz targets. The table below summarizes this trade-off:
Classical machine learning (ML) refers to traditional algorithms in machine learning that predate the current wave of deep learning. Deep learning usually involves large, complex neural networks. Classical ML techniques include various algorithms, such as logistic regression, support vector machines (SVMs), and decision trees. However, these techniques rely heavily on feature engineering to work well.
Deep neural networks can discover or create features from the raw data automatically, but classical ML models often require human domain knowledge expertise to generate these features. This is where Edge Impulse can help! We offer a number of to help generate features based on various use cases. You can also perform autoML with to see which combinations of processing and machine learning (including classical ML) blocks work best for your dataset.
Only available with Edge Impulse Professional and Enterprise Plans
Try our or FREE today.
Traditional ML models are often easier to understand and interpret than their deep learning cousins. The simpler algorithms and structures used in traditional models make it easier to understand the relationship between input features and output predictions.
We implement these modules using scikit-learn, which is an extremely popular ML package used in the creation of models for real-world applications. Once trained, models are converted to Jax, a linear algebra library. That model is then converted to a TensorFlow Lite (float 32) model, which will run on a variety of platforms.
The ability to convert Jax to TensorFlow Lite models opens up a wide variety of possibilities when it comes to deploying different machine learning models to edge devices. If you are interested in developing a custom learning block, see . You can also use this as a starting point.
You can select one of several algorithms depending on your project type: classification or regression. Here is a quick reminder about the difference between the two types:
Classification is used when you want to identify a sample as belonging to one particular grouping. It requires the number of possible outputs to be a discrete number. For example, classification is used if you want to determine if a picture is of a dog or a cat (2 possible outputs).
Regression is used to predict a continuous value based on the input data. For example, predicting the price of a house based on location, average neighborhood sell price, etc.
To start, select the Classification when building your impulse.
Classical ML models are also available for Regression.
After generating features, head to the Classifier learn block page. Click Add an extra layer. Under Complete architectures, you can select one of the many available classical ML models.
When training your classical ML model, you should configure the required hyperparameters. Note that some may require far more training cycles (epochs) than what you are used to with deep learning (e.g. 1000 epochs). However, note that these algorithms train much faster than most neural networks!
Note that Expert mode is not available for classical ML models.
Once you have your trained model, you can deploy the impulse to a variety of devices, including microcontrollers.
The gesture dataset is relatively simple. As a result, feature engineering and a classical ML model work very well. On more complex data, you might need to use deep learning to achieve your desired accuracy.
Logistic regression is simple, fast, and efficient. However, it requires a linear relationship between the input and predicted class probabilities, which means it will not work well on complex data (e.g. non-linear relationships or many input dimensions).
SVMs make for robust classification systems that work well with high-dimensional data (i.e. a single sample containing many values, such as different sensor values). However, they can struggle if classes in the dataset overlap significantly. If this is the case for your dataset, you may want to turn to neural networks.
XGBoost is fast and efficient. It also has built-in methods for handling missing data, and it generally performs better with smaller datasets over LightGBM. However, it does not work as well as neural networks on complex data, and it is prone to overfitting.
LightGBM is also fast and efficient, but slightly less so than XGBoost, making it a better choice for larger datasets. Like XGBoost, it may not work well with complex data and is prone to overfitting.
See the section below to learn about the different options.
Edge Impulse supports a number of classical algorithms to get you started. If you are unsure of which algorithm to use, we recommend using the to guide you.
(despite its name) is a classifier; it is used to classify input data into one of several, discrete categories. It works by first fitting a line (or surface) to the data, just like in . From there, the predicted output (of linear regression) is fed into the to classify the input as belonging to one of several classes.
The source code for the logistic regression block can be found in this repository: .
rely on a technique called the “kernel trick” for mapping points in low-dimensional to high-dimensional space. By doing this, groupings of data can often be separated into clearly defined categories.
An example Edge Impulse project using an SVM can be found here: .
The source code for the SVM block can be found in this repository: .
is a type of machine learning model that employs multiple . Random forests are simple to train and offer relatively high accuracy for classical ML approaches.
An example Edge Impulse project using a random forest classifier can be found here: .
The source code for the random forest block can be found in this repository: .
is an open-source implementation of gradient boosting, which is a type of ensemble learning that uses a combination of simpler models, such as decision trees. It works well for classification and regression tasks. Tree-based methods, like XGBoost, compare values only between samples and not between values in a sample. As a result, they work well with features that have different magnitudes and scales.
An example Edge Impulse project using XGBoost for regression can be found here: .
The source code for the XGBoost block can be found in this repository: .
Similar to XGBoost, is another type of gradient-boosted ensemble model often constructed with decision trees, and it works well for both classification and regression tasks. Because it is a tree-based method, LightGBM compares values between samples rather than between features in a sample, thus making it robust when dealing with features that have different magnitudes.
An example Edge Impulse project using LightGBM for classification can be found here: .
The source code for the LightGBM block can be found in this repository: .
If you want to implement your own learning block for Edge Impulse, see the guide .
Labelling method
Bounding boxes
Bounding Boxes
Bounding boxes
Bounding boxes
Input size
320x320
Square (any size)
Flexible
Flexible
Image format
RGB
Greyscale & RGB
RGB
RGB
Output
Bounding boxes
Centroids
Bounding boxes
Bounding boxes
MCU
❌
✅
✅
✅
CPU/GPU
✅
✅
✅
✅
Limitations
- Works best with big objects - Models use high compute resources (in the edge computing world) - Image size is fixed
- Works best when objects have similar sizes & shapes - The size of the objects are not available - Objects should not be too close to each other
- Works best on high end MCU
- More compute intensive - Not suitable for all edge devices
Requirement
Minimum
Recommended
Memory Footprint (RAM)
256 KB 64x64 pixels (B&W, buffer included)
≥ 512 KB 96x96 pixels (B&W, buffer Included)
Latency (100% load)
80 MHz < 1 fps
> 80 MHz + acceleration ~15 fps @ 480MHz 40-60fps in RPi4
Building a robust machine learning model, especially in the realm of computer vision, is challenging due to the need for extensive datasets and significant computational resources. Transfer learning has emerged as a powerful solution, allowing developers to leverage pre-trained models and adapt them to their specific needs. This guide provides an overview of various community created custom learn blocks, and their applications.
Prerequisites
A object detection project: See object detection for details on how to create one.
Tutorials Want to create your own Custom Learn Block? Check out our tutorial:
To select a community created learning block, click Object detection in the menu on the left. Here you can select Choose a different model, and we will select YOLOv5 which was created by our COMMUNITY. You can see the detail for the given block here too for example: Yolov5 is a transfer learning model based on Ultralytics YOLOv5 using yolov5n.pt weights, supports RGB input at any resolution (square images only).
Below is a detailed table of custom learn blocks created by the community, showcasing their capabilities and potential applications:
Notes:
Ultra Low-end MCU: Devices with very limited memory and processing power, typically used for sensor-driven tasks.
Low-end MCU: More capable than ultra low-end MCUs, but still limited in processing power and memory.
NPU: Specialized for neural network processing; efficient for machine learning tasks.
CPU (MPU): General-purpose processors, capable of handling complex computations and larger models.
GPU: High-performance processing units, ideal for large-scale and compute-intensive machine learning models.
Sensor Applications: Indicates the types of applications each model is typically used for, based on sensor data processing capabilities.
Ensure that your chosen learn block is compatible with your hardware. Some blocks, like YOLOv5, have specific hardware requirements.
Each learn block comes with its own set of limitations. Understanding these is crucial for effective model development.
Align your project requirements with the capabilities of the learn block. For instance, use YOLOv5 for complex object detection tasks and Keras for simpler tasks.
The community blocks are not always integrated by Edge Impulse. This means they won't be tested on our CI/CD workflows.
Thus, we will provide limited support on the forum. If you are interested in using them for an enterprise project, please check our pricing page and contact us directly, our solution engineers can work with you on the integration:
YOLOv5 (Community Block)
Log items.
Metrics output issues.
Jetson Nano compatibility issues.
Lack of model size feedback pre-training completion.
Fixed batch size, no modification option.
EfficientNet (Community Block)
Potential compatibility issues with low-resource devices.
Scikit-learn (Community Block)
Potential compatibility issues with low-resource devices.
Custom learn blocks offer a flexible approach to machine learning, enabling you to tailor models to your specific needs. By understanding the capabilities and limitations of each block, you can harness the power of machine learning more effectively in your projects.
Want to use a novel ML architecture, or load your own transfer learning models into Edge Impulse? Create a custom learning block! It's easy to bring in any training pipeline into the Studio, as long as you can output TFLite or ONNX files. We have end-to-end examples of doing this in Keras, PyTorch and scikit-learn.
If you just want to modify the neural network architecture or loss function, you can also use expert mode directly in the Studio, without having to bring your own model. Go to any ML block, select three dots, and select Switch to Keras (expert) mode.
This page describes the input and output formats if you want to bring your own architecture, but a good way to start building a custom learning block is by modifying one of the following example repositories:
YOLOv5 - wraps the Ultralytics YOLOv5 repository (trained with PyTorch) to train a custom transfer learning model.
EfficientNet - a Keras implementation of transfer learning with EfficientNet B0.
Keras - a basic multi-layer perceptron in Keras and TensorFlow.
PyTorch - a basic multi-layer perceptron in PyTorch.
Scikit-learn - trains a logistic regression model using scikit-learn, then outputs a TFLite file for inferencing using jax.
In this tutorial, we will explain how to set up a learning block, push it to an Edge Impulse organization, and use it in a project.
A learning block consists of a Docker image that contains one or more scripts. The Docker image is encapsulated in a learning block with additional parameters.
Here is a diagram of how a minimal configuration for a learning block works:
We will walk through creating a custom learning block, pushing it to our organization (enterprise accounts only), and running it in a project. To perform this, we will use the example learning block found in this repository.
To start, create a directory somewhere on your computer. I'll name mine my-custom-learning-block/. You should also create a directory named data/ in that project directory to store a dataset that we will use for testing the block locally. Finally, create a directory for storing the output model.
We will be working in this directory to create our custom learning block. It will also hold data for testing locally. After working through this tutorial, you should have a directory structure like the following:
We will explain what each of these files does in the rest of this getting started section.
In your project, you will likely have access to data that you have collected. Note that you will need to convert your raw data into features stored in NumPy format (*.npy). For demonstrating our custom learning block, we will download pre-generated features from a public project: Tutorial: Continuous motion recognition.
From the project's dashboard, download all four NPY files. These contain the features as generated by the processing block (Spectral features), and they are what the learning block expects as inputs.
Move the files to your data/ directory, and rename them to the following:
X_split_train.npy
Y_split_train.npy
X_split_test.npy
Y_split_train.npy
You can also run the following commands to download the files directly into your data/ directory:
To initialize your block, the easiest method is to use the Edge Impulse CLI blocks command from within the my-custom-learning-block/ directory: edge-impulse-blocks init
. Follow the on-screen prompts to log in to your account, select your organization, and configure your block:
This will create a file named .ei-block-config in your current directory. Feel free to look at the contents of that file to see how the block was configured:
You can also create your learning block within Edge Impulse Studio. Open your organization page. On the side, click Machine learning under Custom blocks. From there, click Add new Machine Learning block and fill out the required information.
Download the following Python scripts and requirements file:
You can also easily download these files with the following commands:
Feel free to look through these scripts to see how Keras is used to construct and train a simple dense neural network. Also, note that you are not required to use Python! You are welcome to use any language or system you wish, so long as it will run in a Docker container.
Important! Pay attention to the inputs (features) and outputs (trained model file) of your script. They must match the expected inputs and outputs of the block. See the Input format and Output format sections for more information.
Next, we need to wrap our training script in a Docker image. To do that, we write a Dockerfile. If you are not familiar with Docker, we recommend working through Docker's getting started guide. See here to learn more about the required Dockerfile components in learning blocks.
Create a new file named Dockerfile (no extension) and copy in the following code:
Note: we are not installing CUDA for this simple example. If you wish to install CUDA in your image to enable GPU-accelerated training (which includes training inside your Edge Impulse project), please refer to the full example here.
Make sure you have Docker installed and running on your computer. Execute the following commands to build and run your image:
You should see your model train for 30 epochs and then be converted to a .tflite file for inference. Your out/ directory should have the following files/folders:
model.tflite
model_quantized_int8_io.tflite
saved_model/
saved_model.zip
The saved_model.zip file is an archive of the saved_model/ directory, which contains your model stored in the TensorFlow SavedModel format. The model.tflite file is the float32 version of the model and converted to the TensorFlow Lite format. The model_quantized_int8_io.tflite file is the same TFLite model, but with weights quantized to 8 bits.
You can expose your block's parameters to the Studio GUI by defining JSON settings in the parameters.json
file. Create a file with that exact name and copy in the following:
This will expose the epochs and learning-rate parameters to the Studio interface so that users can make changes in the project. You can learn more about arguments in this section.
Once you have verified operation of your block and configured the parameters, you will want to push it to your Edge Impulse organization. From your project directory, run the following command:
Once that command completes, head to your Organization in the Edge Impulse Studio. Click on Machine learning under Custom blocks. You should find your custom learning block listed there.
You can click on the three dots and select Edit block to view the configuration settings for your block.
Create a project in Studio under your organization (the project must be under the organization to have learning block show up!). We will demonstrate the custom learning block using the continuous gestures dataset found here. Follow the directions in that guide to upload data to your project.
Add a Spectral Analysis block to your impulse. Click on Add a learning block. Assuming your project is in your organization, you should see your custom learning block as one of the available blocks. Click add to use your custom learning block in your project.
Go to the Spectral features page, click Save parameters, and click Generate features to generate the features required for learning.
Next, go to the My learning block page, where you should see the custom parameters you set (number of training cycles and learning rate). Feel free to change those, and select "Start training." When that is finished, you should have a trained model in your project created by your custom learning block!
You can now continue to model testing and deployment, as you would with any project.
Any built-in block in the Edge Impulse Studio (e.g. classifiers, regression models or FOMO blocks) can be edited locally, and then pushed back as a custom block. This is great if you want to make heavy modifications to these training pipelines, for example to do custom data augmentation. To download a block, go to any ML block in your project, click the three dots, select Edit block locally, and follow the instructions in the README.
Training pipelines in Edge Impulse are built on top of Docker containers, a virtualization technique which lets developers package up an application with all dependencies in a single package. To train your own model you'll need to wrap all the required packages, your scripts, and (if you use transfer learning) your pre-trained weights into this container. When running in Edge Impulse the container does not have network access, so make sure you don't download dependencies while running (fine when building the container).
Important: ENTRYPOINT
It's important to create an ENTRYPOINT
at the end of the Dockerfile to specify which file to run.
GPU Support
If you want to have GPU support (only for enterprise customers), you'll need cuda packages installed. If you export a learn block from the Studio these will already have the right base packages, so use that Dockerfile as a starting point.
The entrypoint (see above in the Dockerfile) will be called with these parameters:
--data-directory
- where you can find the data (see below for the input/output formats).
--out-directory
- where to write the TFLite or ONNX files (see below for the input/output formats).
Additionally, you can specify custom arguments (like the learning rate, or whether to use data augmentation) by adding a parameters.json
file to your block. This file describes all arguments for your training pipeline, and is used to render custom UI elements for each parameter. For example, this parameters file:
Will be displayed as:
And passes in --learning-rate-1 0.01 --learning-rate-2 0.001
to your script. For more information, and all options see Adding parameters to custom blocks.
If you do not specify a parameters.json
file, there will be 2 default elements rendered ("Learning rate" and "Number of training cycles"), which will be passed in as:
--learning-rate
- learning rate to train with (set by the user in the UI).
--epochs
- number of epochs to train for (set by the user in the UI).
The data directory contains your dataset, after running any DSP blocks, and already split in a train/validation set:
X_split_train.npy
Y_split_train.npy
X_split_test.npy
Y_split_train.npy
The X_*.npy
files are float32 Numpy arrays, already in the right shape (e.g. if you're training on 96x96 RGB images this will be of shape (n, 96, 96, 3)
). You can typically load these without any modification into your training pipeline (see the notes after this section for caveats).
The Y_*.npy
files are either:
int32 Numpy arrays, with four columns (label_index
, sample_id
, sample_slice_start_ms
, sample_slice_end_ms
).
A JSON array in the form of:
[{ "sampleId": 234731, "boundingBoxes": [{ "label": 1, "x": 260, "y": 313, "w": 234, "h": 261 }] } ]
2) is sent if your dataset has bounding boxes, in all other cases 1) is sent.
Data format for image projects
For image projects, we automatically normalize data before passing the data to the ML block. The X_*.npy
values may then be rescaled based on the selected input scaling when building the custom ML block (details in the next section).
To get new data for your project, just run (requires Edge Impulse CLI v1.16 or higher):
This regenerates features (if necessary) and then downloads the updated dataset.
The input features for vision models are a 4D vector of shape (n, WIDTH, HEIGHT, CHANNELS)
, where the channel data is in RGB
format. We support three ways of scaling the input:
Pixels ranging 0..1 - just the raw pixels, without any normalization. Data coming from the Image DSP block is unchanged.
Pixels ranging 0..255 - just the raw pixels, without any normalization. Data coming from the Image DSP block is multiplied by 255.
PyTorch - the default way that inputs are scaled in most torchvision models, first it takes the raw pixels 0..1 then normalizes per-channel using the ImageNet mean and standard deviation:\
The input scaling is applied:
In the input features vector; so the inputs are already scaled correctly, no need to re-scale yourself. If you're converting the input features vector into images before training, as your training pipeline requires this, then make sure to un-normalize first.
When running inference, both in the Studio and on-device. So also, no need to re-scale yourself.
You can control the image input scaling when you create the block in the CLI (1.19.1 or higher), or by editing the block in the UI.
If you need data in channels-first (NCHW
) mode, then you'll need to transpose the input feature vector yourself before training. You can still just write out an NCHW
model, Edge Impulse supports both NHWC
and NCHW
models.
Edge Impulse only supports RGB models. If you have a model that requires BGR
input, rather than RGB
input (e.g. Resnet50) you'll need to transpose the first and last channels.
In Keras you do this by adding a lambda layer. Example using Resnet50.
For PyTorch you do this by first converting the trained model to ONNX, then transposing using scc4onnx.
The training pipeline can output either TFLite or ONNX files:
If you output TFLite files
model.tflite
- a TFLite file with float32 inputs and outputs.
model_quantized_int8_io.tflite
- a quantized TFLite file with int8 inputs and outputs.
saved_model.zip
- a TensorFlow saved model (optional).
At least one of the TFLite files is required.
If you output ONNX files
model.onnx
- An ONNX file with float16 or float32 inputs and outputs.
We automatically convert this file to both unquantized and quantized TFLite files after training.
I'm using scikit-learn, I don't have TFLite or ONNX files...
If you have a training pipeline that cannot output TFLite files by default (e.g. scikit-learn), you can use jax to implement the inference function; and compile that to TFLite. See our example repository. If there's any TFLite ops in your final model that are not supported by the EON Compiler (so you cannot run on device), then please let us know on the forums.
Host your block directly within Edge Impulse with the Edge Impulse CLI:
To edit the block, go to:
Enterprise: go to your organization, Custom blocks > Machine learning.
Developers: click on your photo on the top right corner, select Custom blocks > Machine learning.
The block is now available from inside any of your Edge Impulse projects. Add it via Create impulse > Add a learning block.
Unfortunately object detection models typically don't have a standard way to go from neural network output layer to bounding boxes. Currently we support the following types of output layers:
MobileNet SSD
Edge Impulse FOMO
YOLOv5 (compatible with Ultralytics YOLOv5 v6)
YOLOv5 for Renesas DRP-AI
YOLOv7
YOLOX
If you have an object detection model with a different output layer then please contact your user success engineer (enterprise) or let us know on the forums (free users) with an example on how to interpret the output, and we can add it.
When training locally you can use the profiling API to get latency, RAM and ROM estimates. This is very useful as you can immediately see whether your model will fit on device. Additionally, you can use this API as part your experiment tracking (f.e. in Weights & Biases or MLFlow) to wield out models that won't fit your latency or memory constraints.
The profiling API expects:
A TFLite file.
A reference device (for latency calculation) - you can get a list of all devices via getProjectInfo in the latencyDevices
object.
A reference model (which model is closest to your architecture) - you can choose between gestures-large-f32
, gestures-large-i8
, image-32-32-mobilenet-f32
, image-32-32-mobilenet-i8
, image-96-96-mobilenet-f32
, image-96-96-mobilenet-i8
, image-320-320-mobilenet-ssd-f32
, keywords-2d-f32
, keywords-2d-i8
. Make sure to use i8
models if you have quantized your model.
You can also use the Python SDK to profile your model easily. See here for an example on how to profile a model created in Keras.
Architecture | Description | Compatibility | Applications |
---|---|---|---|
YOLOv5 (Community)
A high-speed, accurate object detection model.
NPU, CPU (MPU), GPU
Advanced object detection, image analysis
EfficientNet (Community)
A scalable image classification model.
Low-end MCU, NPU, CPU (MPU), GPU
Image classification, facial recognition, scene detection
Keras (Community)
A versatile tool for classification and regression tasks.
Ultra Low-end MCU, Low-end MCU, NPU, CPU (MPU), GPU
Diverse classification tasks, data analysis
PyTorch (Community)
Suitable for foundational machine learning tasks.
Ultra Low-end MCU, Low-end MCU, NPU, CPU (MPU), GPU
Pattern recognition, foundational machine learning tasks
Scikit-learn (Community)
A logistic regression model for classification.
Ultra Low-end MCU, Low-end MCU, NPU, CPU (MPU), GPU
Prototyping, data analysis
Object detection tailored for Renesas platforms.
NPU, CPU (MPU), GPU
Industrial automation, advanced image processing
YOLOv5 Community Community
Flexible object detection model.
NPU, CPU (MPU), GPU
General object detection, traffic monitoring, retail analytics
Advanced object detection for Texas Instruments hardware.
NPU, CPU (MPU), GPU
Automotive systems, smart city applications