Important documentation for the Python SDK:
Use the new Python SDK to upload a trained model and utilize profiling and deployment processes. Get RAM, ROM, and inference time estimates on edge hardware, and then convert to embedded code (e.g. C++, libraries, full binaries). You can also now upload your trained models to Studio if you still prefer to work in our Studio environment. If a model or operation is not supported, the Studio or Python SDK will gracefully let you know.
Keep reading to try out BYOM yourself with the Python SDK!
The Python SDK consists of two main libraries:
The Python SDK, on the other hand, offers an easy-to-use interface to perform several common functions. For instance, you can use the SDK to profile your model, which estimates the RAM, ROM, and inference time when using your model on one of several hardware platforms. The SDK also lets you deploy your model easily, converting it from one of several formats to a C++ library (or other supported deployment format).
Install the Python SDK with:
To use the Python SDK, you need to first create a project in Edge Impulse and copy the API key. Once you have created the project, open it, navigate to Dashboard and click on the Keys tab to view your API keys. Double-click on the API key to highlight it, right-click, and select Copy.
Note that you do not actually need to use the project in the Edge Impulse Studio. We just need the API Key.
From there, import the package and set the API key:
The functions in the Python SDK can be used in your MLOps pipelines to help you develop edge ML models as well as automatically deploy your model to your target hardware.
The following input formats are supported:
ONNX model file (use torch.onnx to export a PyTorch model to ONNX)
You can pass a model (in one of the supported input formats) along with one of several possible hardware targets to the profile()
function. This will send the model to your Edge Impulse project, where the RAM, ROM, and inference time will be estimated based on the target hardware.
To get the available hardware targets for profiling, run the following:
You should see a list printed such as:
A common option is the 'cortex-m4f-80mhz'
, as this is a relatively low-power microcontroller family. From there, we can use the Edge Impulse Python SDK to generate a profile for your model to ensure it fits on your target hardware and meets your timing requirements.
This will produce an output such as the following:
You can then parse the output from the response variable (resp
) in your MLOPs pipeline to determine if your model will fit within your hardware constraints. For example, the following will print out the RAM and ROM requirements along with the estimated inference time (ms) for the cortex-m4f-80mhz
target (assuming you are using the float32 version of the model):
You can even set up experiments (for example, see this tutorial using Weights & Biases) to see how changing the model architecture and adjusting hyperparameters affects the predicted memory and timing requirements.
Once you are ready to deploy your model, you can call the deploy()
function to convert your model from one of the available input formats to one of the Edge Impulse supported outputs. Edge Impulse can output a number of possible deployment libraries and pre-compiled binaries for a wide variety of target hardware.
The default option downloads a .zip file containing a C++ library containing the optimized inference runtime and your trained model. As long as you have a C++ compiler for your target hardware (and enough RAM and ROM), you can run inference!
The following will convert "my_model"
(which might be a SavedModel directory) to a C++ library. Note that you need to specify the model type (Classification
, in this case).
Your C++ library can be found in a .zip file in the current directory. If you do not specify output_directory
, the file(s) will not be downloaded. Instead, you can use the return value of ei.model.deploy()
, which is the file as a raw set of bytes. You can then write those bytes to a file of your choosing. See this example for a demonstration.
You can read more about using the C++ library for inference here.
To get the full list of available hardware targets for deployment, run the following:
You should see a list printed such as:
You can pass your desired target into ei.model.deploy()
using the deploy_target
argument, for example deploy_target='zip'
.
Important! The deployment targets list will change depending on the values provided for model
, model_output_type
, and model_input_type
in the next part. For example, you will not see openmv
listed once you upload a model (e.g. using .profile()
or .deploy()
) if model_input_type
is not set to ei.model.input_type.ImageInput()
. If you attempt to deploy to an unavailable target, you will receive the error Could not deploy: deploy_target: ...
. If model_input_type
is not provided, it will default to OtherInput. See this page for more information about input types.
You can optionally quantize a model during deployment. A quantized model will use an internal int8
numeric representation rather than float32
, which can result in reduced memory usage and faster computation on many targets.
Quantization requires a sample of data that is representative of the range (maximum and minimum) of values in your training data. It should either be an in-memory numpy array, or the path to a numpy file. Each element of the array must have the same shape as your model's input.
You can pass the representative data sample via the representative_data_for_quantization
argument:
Note that quantization is a form of lossy compression and may result in a reduction in model performance. It's important to evaluate your model after quantization to ensure it still performs well enough for your use case.
We offer the following tutorials to help you use the Edge Impulse Python SDK with a number of other machine-learning platforms: