Edge Impulse Python SDK

The Edge Impulse Python SDK is a library to help you develop machine learning (ML) applications for edge and Internet of Things (IoT) devices. While the Edge Impulse Studio is a great interface for guiding you through the process of collecting data and training a model, the edgeimpulse Python SDK allows you to programmatically Bring Your Own Model (BYOM), developed and trained on any platform.

Important documentation for the Python SDK:

Bring Your Own Model (BYOM)

You can upload models trained in a variety of frameworks, such as TensorFlow, PyTorch, or MATLAB, to the Studio or use our new Python SDK to script the profiling and deployment processes.

Use the new Python SDK to upload a trained model and utilize profiling and deployment processes. Get RAM, ROM, and inference time estimates on edge hardware, and then convert to embedded code (e.g. C++, libraries, full binaries). You can also now upload your trained models to Studio if you still prefer to work in our Studio environment. If a model or operation is not supported, the Studio or Python SDK will gracefully let you know.

Keep reading to try out BYOM yourself with the Python SDK!

The Edge Impulse Python SDK

The Python SDK consists of two main libraries:

  • Python SDK - Classes and functions built using the Python API bindings to make the process of profiling and deploying your models easier. You can view the API reference guide for the Python SDK here.

  • Python API bindings - Python wrappers for the Edge Impulse web API that allow you to interact with projects programmatically (i.e. without needing to use the Studio graphical interface). You can view the API reference guide for the Python API bindings here.

You can use the Python API bindings to control your account settings, create projects, add data, train models, deploy models, and so on. These functions offer the granular control found within the web API without needing to construct HTTP requests manually. You can read more about how to use the Python API bindings here.

The Python SDK, on the other hand, offers an easy-to-use interface to perform several common functions. For instance, you can use the SDK to profile your model, which estimates the RAM, ROM, and inference time when using your model on one of several hardware platforms. The SDK also lets you deploy your model easily, converting it from one of several formats to a C++ library (or other supported deployment format).

The Python SDK package is known as edgeimpulse and is registered on pypi.org. Note that when you install the edgeimpulse Python SDK package (e.g. with pip), the Python API bindings (known as the edgeimpulse_api package) will automatically be installed as a dependency.

Using the Python SDK

Install the Python SDK with:

python -m pip install edgeimpulse

To use the Python SDK, you need to first create a project in Edge Impulse and copy the API key. Once you have created the project, open it, navigate to Dashboard and click on the Keys tab to view your API keys. Double-click on the API key to highlight it, right-click, and select Copy.

Note that you do not actually need to use the project in the Edge Impulse Studio. We just need the API Key.

From there, import the package and set the API key:

import edgeimpulse as ei
ei.API_KEY = "ei_dae27..." # Change to your key

The functions in the Python SDK can be used in your MLOps pipelines to help you develop edge ML models as well as automatically deploy your model to your target hardware.

Supported input formats

The following input formats are supported:

You can check the limitations in the BYOM - Limitations section.

Profile

You can pass a model (in one of the supported input formats) along with one of several possible hardware targets to the profile() function. This will send the model to your Edge Impulse project, where the RAM, ROM, and inference time will be estimated based on the target hardware.

To get the available hardware targets for profiling, run the following:

ei.model.list_profile_devices()

You should see a list printed such as:

['alif-he',
 'alif-hp',
 'arduino-nano-33-ble',
 'arduino-nicla-vision',
 'portenta-h7',
 'brainchip-akd1000',
 'cortex-m4f-80mhz',
 'cortex-m7-216mhz',
 ...
 'ti-tda4vm']

A common option is the 'cortex-m4f-80mhz', as this is a relatively low-power microcontroller family. From there, we can use the Edge Impulse Python SDK to generate a profile for your model to ensure it fits on your target hardware and meets your timing requirements.

profile = ei.model.profile(model=model, device='cortex-m4f-80mhz')
print(profile.summary())

This will produce an output such as the following:

Target results for float32:
===========================
{   'device': 'cortex-m4f-80mhz',
    'tfliteFileSizeBytes': 3364,
    'isSupportedOnMcu': True,
    'memory': {   'tflite': {'ram': 2894, 'rom': 33560, 'arenaSize': 2694},
                  'eon': {'ram': 1832, 'rom': 11152}},
    'timePerInferenceMs': 1}

You can then parse the output from the response variable (resp) in your MLOPs pipeline to determine if your model will fit within your hardware constraints. For example, the following will print out the RAM and ROM requirements along with the estimated inference time (ms) for the cortex-m4f-80mhz target (assuming you are using the float32 version of the model):

print(f"Estimated RAM usage: {profile.model.profile_info.float32.memory.tflite.ram}")
print(f"Estimated ROM usage: {profile.model.profile_info.float32.memory.tflite.rom}")
print(f"Estimated inference time (ms): {profile.model.profile_info.float32.time_per_inference_ms}")

You can even set up experiments (for example, see this tutorial using Weights & Biases) to see how changing the model architecture and adjusting hyperparameters affects the predicted memory and timing requirements.

Deploy

Once you are ready to deploy your model, you can call the deploy() function to convert your model from one of the available input formats to one of the Edge Impulse supported outputs. Edge Impulse can output a number of possible deployment libraries and pre-compiled binaries for a wide variety of target hardware.

The default option downloads a .zip file containing a C++ library containing the optimized inference runtime and your trained model. As long as you have a C++ compiler for your target hardware (and enough RAM and ROM), you can run inference!

The following will convert "my_model" (which might be a SavedModel directory) to a C++ library. Note that you need to specify the model type (Classification, in this case).

ei.model.deploy(model="my_model",
                model_input_type=ei.model.input_type.OtherInput(),
                model_output_type=ei.model.output_type.Classification(),
                output_directory=".")

Your C++ library can be found in a .zip file in the current directory. If you do not specify output_directory, the file(s) will not be downloaded. Instead, you can use the return value of ei.model.deploy(), which is the file as a raw set of bytes. You can then write those bytes to a file of your choosing. See this example for a demonstration.

You can read more about using the C++ library for inference here.

To get the full list of available hardware targets for deployment, run the following:

ei.model.list_deployment_targets()

You should see a list printed such as:

['zip',
 'arduino',
 'tinkergen',
 'cubemx',
 'wasm',
 ...
 'runner-linux-aarch64-tda4vm']

You can pass your desired target into ei.model.deploy() using the deploy_target argument, for example deploy_target='zip'.

Important! The deployment targets list will change depending on the values provided for model, model_output_type, and model_input_type in the next part. For example, you will not see openmv listed once you upload a model (e.g. using .profile() or .deploy()) if model_input_type is not set to ei.model.input_type.ImageInput(). If you attempt to deploy to an unavailable target, you will receive the error Could not deploy: deploy_target: .... If model_input_type is not provided, it will default to OtherInput. See this page for more information about input types.

Quantization

You can optionally quantize a model during deployment. A quantized model will use an internal int8 numeric representation rather than float32, which can result in reduced memory usage and faster computation on many targets.

Quantization requires a sample of data that is representative of the range (maximum and minimum) of values in your training data. It should either be an in-memory numpy array, or the path to a numpy file. Each element of the array must have the same shape as your model's input.

You can pass the representative data sample via the representative_data_for_quantization argument:

ei.model.deploy(model="my_model",
                model_output_type=ei.model.output_type.Classification(),
                representative_data_for_quantization=representative_data,
                output_directory=".")

Note that quantization is a form of lossy compression and may result in a reduction in model performance. It's important to evaluate your model after quantization to ensure it still performs well enough for your use case.

Getting started guides

We offer the following tutorials to help you use the Edge Impulse Python SDK with a number of other machine-learning platforms:

Last updated