After training and validating your model, you can now deploy it to any device. This makes the model run without an internet connection, minimizes latency, and runs with minimal power consumption.
The Deployment page consists of a variety of deployment options to choose from depending on your target device. Regardless of whether you are using a fully supported development board or not, you can deploy your impulse to any device. The C++ library and Edge Impulse SDK enable the model to run without an internet connection on the device, minimize latency, and with minimal power consumption.
You can also select different model optimization options on the deployment page:
Model version: Quantized (int8) vs unoptimized (float32) versions.
Compiler options: TFLite vs EON Compiler. The EON Compiler also comes with an extra option: EON Compiler (RAM optimized) to reduce the RAM even further when possible.
There are 6 main deployment options currently supported by Edge Impulse:
Deploy as a customizable library
Deploy as a pre-built firmware - for fully supported development boards
Use Edge Impulse for Linux for Linux targets
Deploy as a Docker container
Run directly on your phone or computer
Create a custom deployment block (Enterprise feature)
From the Deployment page, select the Search deployment options search box to select and configure a deployment option:
These deployment options let you turn your impulse into a fully optimized source code that can be further customized and integrated with your application. The customizable library packages all of your signal processing blocks, configuration and machine learning blocks into a single package with all available source code. Edge Impulse supports the following libraries (depending on your dataset's sensor type):
Ethos-U library
Meta TF model
Simplicity Studio Component
Tensai Flow library
TensorRT library
TIDL-RT library
For these deployment options, you can use a ready-to-go binary for your development board that bundles signal processing blocks, configuration and machine learning blocks into a single package. This option is available for fully supported development boards.
To deploy your model using ready-to-go binaries, select your target device and click "build". Flash the downloaded firmware to your device then run the following command:
The impulse runner shows the results of your impulse running on your development board. This only applies to ready-to-go binaries built from the studio.
If your training and testing datasets include a sensor data type that is not supported by a deployment target, the search box will include these greyed out with a Not supported label:
If you are developing for Linux-based devices, you can use Edge Impulse for Linux for deployment. It contains tools that let you collect data from any microphone or camera, can be used with the Node.js, Python, Go and C++ SDKs to collect new data from any sensor, and can run impulses with full hardware acceleration - with easy integration points to write your own applications.
For a deep dive into deploying your impulse to Linux targets using Edge Impulse for Linux, you can visit the Edge Impulse for Linux guides.
Deploying Edge Impulse models as a Docker container allows for packaging signal processing, configurations, and learning blocks into a single container that exposes an HTTP inference server. This method is ideal for environments supporting containerized workloads, facilitating deployment on gateways or in the cloud with full hardware acceleration for most Linux targets. Users can initiate deployment by selecting the "Docker container" option within the Deployment section of their Edge Impulse project.
See how to run inference using a Docker container.
You can run your impulse directly on your computer/mobile phone without the need of an additional app. To run on your computer, click Launch in browser. To run on your mobile phone, scan the QR code and click Switch to classification mode.
Download the most recent build from your project's deployment page under Latest build:
When building your impulse for deployment, Edge Impulse gives you the option of adding another layer of optimization to your impulse using the EON compiler.
The EON Compiler lets you run neural networks using less RAM, and saving flash resource, while retaining the same accuracy compared to TensorFlow Lite for Microcontrollers.
Depending on your neural network architecture, we can also provide one extra layer of optimization with the EON Compiler (RAM optimized).
You can also select whether to run the unquantized float32 or the quantized int8 models. To compare model accuracy, run model testing in your project by clicking Run model testing.
To have a peek at how your impulse would utilize compute resources of your target device, Edge Impulse also gives an estimate of latency, flash, and RAM to be consumed by your target device even before deploying your impulse locally. This can save you a lot of engineering time, and costs incurred by recurring iterations and experiments.
The Edge Optimized Neural (EON) compiler is a powerful tool, included in Edge Impulse, that compiles machine learning models into highly efficient and hardware-optimized C++ source code. It supports a wide variety of neural networks trained in TensorFlow or PyTorch - and a large selection of classical ML models trained in scikit-learn, LightGBM or XGBoost. The EON Compiler also runs far more models than other inferencing engines, while saving up to 65% of RAM usage.
This approach eliminates complex code, significantly reduces device resource utilization, and saves inference time.
The EON Compiler also includes an additional option: The EON Compiler (RAM optimized), to better cater to diverse project requirements and constraints.
Some of the key advantages of EON Compiler, which include:
25-65% less RAM
10-35% less flash
Same accuracy as TFLite
Faster inference
The EON Compiler is specifically designed for Edge AI applications where speed is paramount. By focusing on minimizing inference time, this version of the EON Compiler ensures that neural network models can execute as quickly as possible, a critical requirement for real-time or near-real-time applications.
The EON Compiler (RAM optimized) option further reduces memory usage, allowing AI models to run on even smaller microcontrollers (MCUs) without sacrificing the model's accuracy or integrity. This is particularly beneficial for developers looking to minimize hardware costs and enhance the feasibility of deploying advanced AI in resource-constrained environments. We use a method to compute values directly as required, thus minimizing the need to store these results. This approach led to a substantial decrease in RAM usage, for a slightly higher latency cost.
Please note that depending on your neural network architecture, we may not be able to provide this option, see the Limitations section.
Only available for organization's projects
The EON Compiler (RAM optimized) is only available on the organization's projects. Try it out with our enterprise free trial or view our pricing for more information.
What do these metrics mean?
Processing blocks: Here we can see the optimizations for the DSP components of the compiled model DSP components. e.g. Spectral Features, MFCC, FFT, Image, etc.
Learn Blocks: The performance of the compiled model on the device. Here we see the time it takes to run inference.
Latency: the time it takes to run the model on the device.
RAM: the amount of RAM the model uses.
Flash: the amount of ROM the model uses.
Accuracy: the accuracy of the model.
The input of the EON compiler is a Tensorflow Lite Flatbuffer file containing model weights. The output is a .cpp and .h files containing unpacked model weights and functions to prepare and run the model inference.
Regular Tflite Micro is based on Tensorflow Lite and contains all the necessary instruments for reading the model weights in Flatbuffer format (which is the content of .tflite file), constructing the inference graph, planning the memory allocation for tensors/data, executing the initialization, preparation and finally invoking the operators in the inference graph to get the inference results.
The advantage of using the traditional Tflite Micro approach is very versatile and flexible. The disadvantage is that all the code for getting the model ready on the device is pretty heavy for embedded systems.
To overcome these limitations, our solution involves performing resource-intensive tasks, such as reading the model from Flatbuffer, constructing the graph, and planning memory allocation directly on our servers.
Subsequently, the EON compiler performs the generation of C++ files, housing the necessary functions for the Init, Prepare, and Invoke stages.
These C++ files can then be deployed on the embedded systems, alleviating the computational burden on those devices.
The EON Compiler (RAM Optimized) option adds, on top of the above, a novel approach by computing values directly as needed and minimizing the storage of intermediate results. This method leads to a significant decrease in RAM usage - sometimes at the cost of a slightly higher latency/flash - without impacting the accuracy of model predictions.
In practice, we demonstrated this with our default 2D convolutional model for visual applications. By slicing the model graph into smaller segments, we managed to reduce the RAM usage significantly — by as much as 40 to 65% compared to TensorFlow Lite — without altering the accuracy or integrity of the model's predictions.
The EON Compiler, including the EON Compiler (RAM optimized) version, is a powerful tool for optimizing neural network projects. However, there are some important limitations to keep in mind:
Unsupported Operators
Not all operators are supported by our compiler. If your model includes an operator we don't support, the compiler won't be able to fully optimize your model. This means that certain complex operations within your model might prevent the compiler from working as efficiently as possible.
Concerning the EON Compiler (RAM optimized) option, our slicing algorithm currently supports limited operators. For instance, if a standard convolutional model incorporates an unsupported operator in its architecture, the compiler will not be able to perform beyond that point. This limitation could restrict the application of our compiler to models that use only supported operations.
Residual Layers
We support models with certain types of residual layers—specifically, those that feed directly into a subsequent layer, like in MobileNet. However, if your model processes residuals in a more complex manner, the EON Compiler may not optimize it effectively.
In this section, we tested many different architectures. Some architectures may not be available with TFLite micro or with the EON-Compiler (RAM-Optimized) - see the limitations section.