Be sure to check out the official blog post over on the Nvidia Developer blog, as well: https://developer.nvidia.com/blog/fast-track-computer-vision-deployments-with-nvidia-deepstream-and-edge-impulse/
Gst-nvinfer
, automatically transforms input data to match the model’s input layer, effectively performing preprocessing that are operations similar to the DSP block when using Edge Impulse’s SDK on embedded systems. DeepStream reduces the development burden by implementing design choices and this convenience requires models to have a consistent input shape.
Gst-nvinfer requires the input tensor to be in the NCHW format where:
Note: There is a TensorRT deployment option in Edge Impulse, however these models don’t work directly with DeepStream because the input layer is not in NCHW format. The TensorRT deployment is better suited for when a developer is manually building applications from the ground up, directly on top of TensorRT where there is complete control of the Inference Engine and application. This requires more coding than using DeepStream. It is also used with Edge Impulse EIM Linux deployments.
Note: Due to TensorRT being used at heart of Gst-nvinfer, it is possible to apply all the TensorRT development capabilities to override DeepStream if it is necessary. For example, manual Engine creation in C++ or Python as well as custom input and output layers through TensorRT’s C++ plugin architecture could be reason’s to override DeepStream.Since the goal of DeepStream is to make the development process easier and more efficient, no-code approaches that simplify the process of working with DeepStream are provided. The two primary no-code TensorRT engine creation approaches are:
trtexec
command from the command line to manually produce serialized Engine files.trtexec
command at https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#trtexec.
All of the steps involved in model conversion can also be applied to creating models in Edge Impulse that work with the Nvidia DeepStream Python samples available as Jupyter Notebooks, or also with Custom C++ implementations which you can build directly from the command line on your Jetpack environment on your Nvidia device.
Depending on the type of model you are building (Image Classification or Audio Classification), there are some specific considerations you need to take into account related to the features and the general process of converting and preparing the model to work with DeepStream. These same steps can be followed for using Edge Impulse as your MLOps tool for building TensorRT models, beyond just DeepStream.
CUDA_VER
environment variable. The output is a .so file that then needs to be added to the Gst-nvinfer plugin configuration file using the custom-lib-path parameter. The custom bounding box parsing function also needs to be specified with the parse-bbox-func-name parameter. In this case the repo provides this function, called NvDsInferParseYolo. The next section will cover the process of configuring the Gst-nvinfer plugin.
The Gst-Nvinfer plugin is configured by means of a plain text file which specifies all the relevant parameters required to run a model with Deepstream and TensorRT behind the scenes. This file needs to be referenced from the Deepstream Application either as a Primary GPU Inference Engine (PGIE) where it is the first Inference to take place in the pipeline or as Secondary GPU Inference Engine (SGIE) where it’s the performing secondary inference on the output of a PGIE upstream.
Object Detection (in this case the YOLOv5 model built in Edge Impulse) is usually the first instance of Gst-nvinfer, i.e. the PGIE.
The minimal working version of a PGIE using the Edge Impulse YOLOv5 ONNX export is shown below:
1
in the above example, and this matches the batch size of the model. In addition the custom output parser is also specified. The model color format is also set to match the format in your Impulse’s image preprocessing/feature block.
The provided repo contains a precompiled output parser ready to run on a Jetson Nano (Jetpack 4.6). The label file needs to be edited to replace the labels with your own label names. The label names can match the label names in your Impulse’s final block in the same order. YOLO uses a label file format where each label is separated by a new line. For a single object type only one entry is sufficient.
.zip
file that contains your TFLite model, stored as an array in a C header file called tflite-trained.h
(located in the tflite-model
folder). When working with an Arduino Library, this folder is located under the src
directory.
x_input
with the following layout with each dimension:
[N,H,W,C]
For example, with a 160x160 pixel input image, the input tensor is float32[1,160,160,1]
This requires a transpose to be added to the input tensor to accept input images as NCHW.
To convert the model from TensorFlow Lite to ONNX with the correct input shape for DeepStream requires the use of “tf2onnx”:
--inputs-as-nchw serving_default_x:0
parameter that adds the transpose to the named input layer. The input layer name must be included for this to be correctly applied. Note that older Edge Impulse Classification exports may have the input tensor named as x_input
. If yours is named x_input
, the command will need to be modified to reflect inputs-as-nchw x_input
, otherwise the model input won’t be changed. The exact input layer name can be determined using Netron.
The result of the conversion should yield results similar to the following:
process-mode
to 2
:
ei_classifier_inferencing_categories
array in the model_variables.h
header file.
.onnx
file.
After the first run, the TensorRT is created as an .engine
file. To prevent rebuilding on subsequent runs, the ONNX file can be commented out and the .engine file can instead be directly referenced. This will prevent rebuilding the engine on each run, saving time.
The major limitation of automatic conversion of the model is that it works with implicit batching, where the batch size is 1, ignoring the batch dimension in the model. This may not be ideal when you need to perform inference on batches of images, to take advantage of the hardware batch inference capabilities.
trtexec
command is part of TensorRT, and allows TensorRT Engines to be manually constructed.
The command is run as follows:
trtexec
enough working temporary memory to create the model engine.