Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Advanced ML workflow with available Jupyter Notebook using computer vision, AWS SageMaker and MLFlow to benchmark industry visual anomaly models.
Created By: Mathieu Lescaudron
Public Project Link: https://studio.edgeimpulse.com/public/376268/latest
GitHub Repo: https://github.com/emergy-official/anomaly.parf.ai
Let's explore the development and optimization of a cloud-based visual anomaly detection model designed for edge deployments, featuring real-time and serverless inference.
We will cover the following topics:
Datasets: Creation of our own datasets.
Models: Development of three different models:
A baseline model + usage of BYOM (Bring Your Own Model on Edge Impulse),
Efficient AD model,
FOMO AD model by Edge Impulse (automated).
Web App:
Setting up a real-time and serverless inference endpoint,
Dataset explorer,
Automating deployments with GitHub Actions and Terraform on AWS.
This is a demo project. All code is provided for you to implement any or all parts yourself.
Imagine we are a commercial baking company that produces cookies. Our goal is to sort cookies to identify those with and without defects (anomalies), so that any broken cookies do not get packaged and sent to retailers.
We are developing a cloud-based proof-of-concept to understand the feasibility of this technique, before deploying it on edge devices.
Although this is only a hypothetical example and demonstration, this quality inspection process and computer vision workflow could absolutely be leveraged by large-scale food service providers, commercial kitches that make packaged retail food items, or any many other mass-produced retail products even beyond the food industry.
We assume we don't have access to Onmiverse Replicator to create a synthetic dataset. Instead, we manually create our own. The first step is to carefully review which cookies to eat use.
We'll create three datasets using three different types of cookies:
One with texture,
One thicker cookie,
One plain cookie.
Each dataset will consist of 200 images, totaling 600 images:
100 without any anomalies
100 with anomalies
50 easy to recognize with a clear, strong separation down the middle,
25 medium difficulty with a separation that has no gap,
25 hard to detect, with small defects and knife marks.
We take around five pictures of each cookie, making slight rotations each time. Here's the result:
Each picture, taken from a mobile phone in a 1:1
ratio with an original size of 2992 x 2992 pixels, is resized to 1024 x 1024 pixels using mogrify command from ImageMagick. It saves computing resources for both the training process and the inference endpoint:
The folder structure looks like this:
You can download the datasets here (95MB) and the raw images here (1GB)
The first model we will develop will be our baseline, serving as our starting point.
It consists of a categorical image classification using a pre-trained MobileNet.
This is Categorical (rather than binary) classification to allow for the addition of more categories of anomalies in the future.
Have a look at the training in this notebook
Here's how the images are distributed for this model:
Training: 144 images (72%)
Validation: 16 images (8%)
Test: 40 images (20%)
Both "anomaly" and "no anomaly" images are used during training.
The model is trained on a Mac using the CPU, running through 50 epochs.
You can find the results in the Step 3: Benchmarking section.
With Edge Impulse's Bring Your Own Model feature, you can easily upload your own model and use all their features.
In our case, let's use a Jupyter notebook that converts the Baseline model to a MacOS version using the Edge Impulse API. (You can do it for a specific edge device, linux, web assembly, etc). It can save you quite some time compared to doing it yourself.
You can find detailed steps in this notebook (scroll down to the section titled Edge Impulse conversion
)
First, start by importing the Edge Impulse Python SDK. Then load your project's API KEY.
After that, define the input and output types for your model:
And then, convert it to the format that fits your needs (in this case, MacOS for the demo):
You'll need to make it executable by using the command chmod +x baseline.eim
. And you're all set! Create an inference function to use it with this model:
Let's use another method called EfficientAD (detailed in a study from arXiv.org).
EfficientAD employs an autoencoder paired with a student-teacher approach to quickly and effectively identify anomalies in images.
Take a look at their video presentation for a brief overview.
The network, named PDN (Patch Description Network), includes a design with 4 convolutional layers and 2 pooling layers. It examines each segment of the 33 x 33 pixel image and produces a feature vector of 384 values.
Two models, student
and teacher
are trained on the same data. The teacher model guides the student model by providing a loss function which helps the student to improve their performance in detecting anomalies.
Anomaly detection during testing is measurable when the student model fails to predict the characteristics of an image. EfficientAD introduces an autoencoder that gives a broader view of the image, improving the overall performance of the detection in addition to the Student-Teacher method.
We're going to reuse some of the code from nelson1425/EfficientAD and update it to suit our needs. You can find the updated code here.
We will test different parameters to build a model that performs well. In the study, they used 70,000 iterations (steps) and pretrained weights from WideResNet-101.
We will experiment different numbers of steps, enabling or disabling the pretrained weights and using the small or medium size of the patch description network (the medium size includes another layer and twice as many features). Each test is called an experiment, and we will use MLFlow to log the parameters and store their results, including the scores and the models.
To run a MLFlow server, either locally or remotely, use the following command:
Here, we're using the --artifacts-destination
argument to specify where to store our models. You can omit this argument if you're not using a S3 bucket on AWS, and it will default to storing the models on the disk.
In your code, you define an experiment like this:
We primarily use MLFlow to track experiments and store artifacts, although it offers many other powerful features including model registry, model deployments, model serving, and more.
You can find the full setup instructions for MLFlow for this demo here.
Let's train our models in the cloud using our notebook. We are using a Jupyter notebook, or you could also use a Python script.
There are many different cloud providers that allow you to train a model. We will use an AWS instance that includes an Nvidia Tesla 4 GPU.
The specific instance type we use is g4dn.xlarge
. To get access to this instance, you need to create a support ticket requesting access to the type G instance type in your region. It will cost us 0.526 USD per hour and we plan to use it for approximately 3h.
For our setup, we'll use a pre-configured AMI with PyTorch named Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.2.0
.
Here is the CLI:
The complete commands used are detailed here
Once you've connected to the instance using ssh and cloned the repository along with the datasets, you can run the following command to start the jupyter notebook:
Make sure you've enabled port forwarding so you can connect to the remote Jupyter notebook locally:
You can now access Jupyter Notebook on the remote instance from your local computer.
For the training, we will only use the images without anomalies. Here's how the data is distributed:
Training
No anomaly: 72 images (36%)
Validation
No Anomaly: 8 images (4%)
Anomaly: 20 images (10%)
Testing
No Anomaly: 20 images (10%)
Anomaly: 80 images (40%)
Once it is trained, you can see the different results in MLFlow:
And you can create graphics to build reports:
For the cookies dataset three, the best model used 3,200 steps, pretrained weights, and the small network. In the study, they used 70,000 steps. We added early stopping
based on the F1 score from the evaluation dataset. Modify this for your needs.
We use the same config for training datasets one and two.
Here's an example of the inference results with EfficientAD. It localizes the anomaly within the image through a heatmap.
Once you're finished, terminate the remote instance. You can find the results in the Step 3: Benchmarking section.
The last model we will build is called FOMO-AD, a visual anomaly detection learning block developed by Edge Impulse. It's based on the FOMO architecture, specifically designed for constrained devices.
Check the FOMO-AD documentation for more information.
Let's automate the entire process using the Edge Impulse API:
Import the dataset,
Create an impulse,
Generate features,
Train the model,
Export the model.
There's too much code to detail here, if you want to replicate it yourself step by step, check out this notebook
We separate our dataset as follows:
Training set
No Anomaly: 80 images (40%)
Testing set
No Anomaly: 20 images (10%)
Anomaly: 100 images (50%)
The best part of the notebook is that it includes a pre-built pipeline in Edge Impulse that will Find the best Visual AD Model
using our dataset. All you need to do is provide the dataset and run the pipeline. After that, you'll have the optimal model set up in your project, and you can find the best threshold to use in the logs (Refer to the Option 2
section in the notebook for more details).
Edge Impulse lets you classify your entire dataset or just one image at a time:
Once the model is exported, you can create an inference function in Python to run it locally:
Now that we've trained all the models, it's time to evaluate how well they perform using the F1 Score. (The F1 Score is a way to measure a model's accuracy, taking into account both precision and recall).
Take a look at this notebook where all the benchmarking is done.
Since each model was trained on different sets of data, we will use the test dataset from EfficientAD model for comparison.
Here are the results, tested on a Macbook:
FOMO-AD performs the best in most datasets. Although EfficientAD could be improved to score higher, it would require more time.
For additional details on performance, including difficulty, time, and RAM usage, check out this notebook. Usually, the inference time of Efficient AD is 300ms, whereas FOMO AD is 35ms.
The EfficientAD model should be used by modern GPUs, where the inference time is about 3ms.
The models are trained and ready to be used, so let's build an app to showcase our proof of concept.
We'll include two features:
A serverless endpoint using SageMaker Serverless Inference with EfficientAD,
A real-time inference using a compact version of the Edge Impulse mobile client with FOMO-AD.
In the public repository, you will find:
The API Code,
The Website Code.
This is the infrastructure of our serverless inference endpoint:
When a user uploads an image to get the anomaly result, it will go through:
Cloudfront (which is also used by the front end. Users are redirected to the API Gateway when the request path matches /api*
),
An API Gateway (to communicate with Lambda and allows for future API expansions),
A Lambda that communicate to the SageMaker endpoint securely,
A Serverless SageMaker endpoint (executes the inference using a Docker container).
The SageMaker endpoint operates using a Docker image. You can build your dockerfile like this:
Then, upload the Docker image to an ECR repository (an Elastic Container Registry).
You can also test the inference locally without using Docker:
Check out the terraform code to configure the SageMaker endpoint, or you can do it manually in the AWS Console.
The serverless inference is quite slow (12 sec per inference), you can speed this up this by increasing the RAM usage, switching to a provisionned endpoint, or using a real-time endpoint within AWS. However, these options will increase the cost. The actual setup cost $ 0.20 per 1,000 inferences, an affordable way for creating demos without impacting your wallet.
If you've previously played with Edge Impulse, you might be familiar with the Launch in browser
feature that lets you test your model in real-time.
Wouldn't it be great to include this feature directly in our web app?
Thanks to Edge Impulse, this feature is open source!
The way it work is that the client is downloading a web assembly .zip file of the model using the Edge Impulse API from your project's API KEY. Then, it unzips the export and loads the model along with multiple scripts to enable real-time inference.
We're going to modify this a bit.
We’ll no longer use the API KEY.
We’ll include the web assembly zip file directly in the website’s assets. (you can download this file manually from Edge Impulse, or it can be downloaded automatically using the API when building the website assets),
We'll keep only the essential code and update what's needed to make it work the new way,
We'll add a colormap function for fun to show the model's confidence.
This is what we obtain:
All the modifications are detailed here in the Mobile Client compressed version detail
section.
For the website, we're using Astro with React based on the AstroWind template.
To automatically deploy the website, we use this github action. It triggers a deployment whenever the commit message includes deploy:website
.
The website is hosted on AWS within an S3 bucket and is behind a Cloudfront distribution.
It also features a dataset explorer that showcases the data used for benchmarking:
It includes all the images, scores, predictions, and timings for all the models and cookies.
One key improvement could be enhancing the dataset. We used a mobile phone with a combination of natural and artificial lighting. The model's performance might improve if you create a synthetic dataset using Onmiverse Replicator featuring different lighting conditions, backgrounds, and more.
It will eliminate manual processing, and you won't need to run 10 km to burn off all the cookies you've eaten.
Perform traffic analysis for smart city and vehicle detection projects with an NVIDIA TAO model and a Jetson Orin Nano.
Created By: Jallson Suryo
Public Project Link: https://studio.edgeimpulse.com/public/310628/live
GitHub Repo: https://github.com/Jallson/Traffic_Analysis_Orin_Nano/
In a smart-city system, analyzing vehicle and traffic flow patterns is crucial for a range of purposes, from city planning and road design, to setting up traffic signs and supporting law enforcement. Current systems often depend on manpower, police or separate devices like speed sensors and vehicle counters, making them less practical. Even when object detection is applied, it typically requires powerful, energy-hungry computers or cloud-based systems, limiting widespread adoption of traffic analysis systems. To address this, a low-energy, edge-based Object Detection Traffic Analysis system can be developed. By integrating this into existing cameras at intersections, highways, and bridges, traffic data can be collected more efficiently, enabling broader implementation at lower costs and energy use.
An object detection model from Edge Impulse is one way of addressing this problem, as model inference output will contains data labels, object coordinates, and timestamps. From this data, we will derive the object's speed and direction, as well as count objects entering or exiting. To simplify the process, we will use an NVIDIA TAO - YOLOv4 pre-trained neural network to build our model, then deploy on to an NVIDIA Jetson Orin Nano. This method grants access to a wide range of pre-trained models, enabling you to leverage existing neural network architectures and weights for your specific tasks. Therefore, the amount of data we need to collect is less than what's typically required when training and building an object detection model from scratch. The Edge Impulse model, combined with NVIDIA TAO, are optimized for efficient performance, achieving faster inference speeds through the Tensor RT library embedded in Orin Nano, which is essential for real-time applications. Overall, this approach can greatly accelerate the development cycle, enhance model performance, and streamline the process for Edge AI applications.
NVIDIA Jetson Orin Nano Developer Kit (8GB)
USB Camera/webcam (eg. Logitech C270/ C920)
DisplayPort to HDMI cable
Display/monitor
Tripod
Keyboard, mouse or PC/Laptop via ssh
Orin Nano case ( 3D print file available at https://www.thingiverse.com/thing:6068997 )
NVIDIA Jetpack (5.1.2)
Edge Impulse Studio
Edge Impulse Linux CLI & Python SDK
Terminal
In the initial stage of building a model in Edge Impulse Studio, we need to prepare the data, which can be in the form of images or videos that will later be split into images. The image and video data can be sourced from free-license databases such as the COCO dataset or Roboflow, which can then be used for object detection training. Alternatively, you can collect your own data to better suit the purposes of your project. Here, I will provide an example of how to upload data in Edge Impulse Studio for both scenarios (see the images below). For those who are not familiar with Edge Impulse Studio, simply visit https://studio.edgeimpulse.com, login or create an account, then create a new Project. Choose Images when given a choice of project type, then Object detection. In Dashboard > Project Info, choose Bounding Boxes for the labeling method and NVIDIA Jetson Orin Nano for the target device. Then move to Data acquisition (on the left hand navigation menu), and click on the Upload Data tab.
Note: When collecting data samples, it's important to remember that the images of vehicles (trucks or cars) to be labeled should not be too small, as the model we're building can only recognize objects with a minimum size of 32x32 pixels.
The next step is labeling. If you're using data from a COCO JSON dataset that has already been annotated, you can skip this step or simply review or edit the existing labels. For other methods, click on Data acquisition, and before labeling video data, you’ll need to split the video into images. Right-click on the three dots to the right, select Split Into Images, then click Yes, Split. Enter the number of frames per second from the video — usually around 1 or 2 — to avoid having too many nearly identical images.
Once the images are ready, you'll see a labeling queue, and you can begin the process. To simplify this, you can select Label suggestions: Classify using YOLOv5, since cars and trucks will be automatically recognized. Turn off other objects if YOLOv5 detects them incorrectly, then click Save label. Repeat this process until all images are labeled.
After labeling, it's recommended to split the data into Training and Testing sets, using around an 80/20 ratio. If you haven't done this yet, you can go back to the Dashboard, and click on Train / Test Split and proceed. As shown here, I only used 150 images, as we'll be training the model with the help of pre-trained NVIDIA TAO-YOLO based models.
Once your labelled dataset is ready, go to Impulse Design > Create Impulse, and set the image width and height to 320x320. Choose Fit shortest axis, then select Image and Object Detection as the Learning and Processing blocks, and click Save Impulse. Next, navigate to the Image Parameters section, select RGB as the color depth, and press Save parameters. After that, click on Generate, where you'll be able to see a graphical distribution of the two classes (car and truck).
Now, move to the Object Detection navigation on the left, and configure the training settings. Select GPU as the compute option and MobileNet v2 (3x224x224) as the backbone option. Set the training cycles to around 400 and the minimum learning rate to 0.000005. Choose NVIDIA TAO YOLOv4 as the neural network architecture — for higher resolutions (eg. 640x640), you can try YOLOv5 (Community blocks) with a model size of medium (YOLOv5m) — Once done, start training by pressing Start training, and monitor the progress.
If everything goes well and the precision result is around 80%, proceed to the next step. Go to the Model Testing section, click Classify all, and if the result is around 90%, you can move on to the final step — Deployment.
Click on the Deployment tab, then search for TensorRT, select (Unoptimized) Float32, and click Build. This will generate the NVIDIA TensorRT library for running inference on the Orin Nano's GPU. Once downloaded, unzip the file, and you'll be ready to deploy the model using the Edge Impulse SDK on to the NVIDIA Jetson Orin Nano.
Alternatively, there's an easier method: simply ensure that the model has been built in Edge Impulse Studio. From there, you can test, download the model, and run everything directly from the Orin Nano.
On the Orin Nano side, there are several things that need to be done. Make sure the unit uses JetPack — we use Jetpack v5.1.2 — which is usually pre-installed on the SD card. Then open a Terminal on the Orin Nano, or ssh to the Orin via your PC/laptop and setup Edge Impulse tooling in the terminal.
You also need to install the Linux Python SDK library (you need Python >=3.7, which is included in JetPack), and it is possible you may need to install Cython to build the Numpy package: pip3 install Cython
, then install the Linux Python SDK: pip3 install pyaudio edge_impulse_linux
. You'll also need to clone the examples: git clone https://github.com/edgeimpulse/linux-sdk-python
Next, build and download the model.
Install Clang as a C++ compiler: sudo apt install -y clang
Clone the following repository and install these submodules:
git clone https://github.com/edgeimpulse/example-standalone-inferencing-linux
cd example-standalone-inferencing-linux && git submodule update --init --recursive
Then install OpenCV:
sh build-opencv-linux.sh
Now make sure the contents of the TensorRT folder from the Edge Impulse Studio .zip
file download have been unzipped and moved to the example-standalone-inferencing-linux
directory.
Build a specific model targeting Orin Nano GPU with TensorRT:
APP_EIM=1 TARGET_JETSON_ORIN=1 make -j
The resulting file will be in ./build/model.eim
Open a terminal on the Orin Nano or ssh from your PC/laptop then run edge-impulse-linux-runner --clean
, which will allow you to select your project. Log in to your account and choose your project. This process will download the model.eim
file, which is specifically built with the TensorRT library targeting the Orin Nano GPU. During the process, the console will display the path where the model.eim
has been downloaded. For example, in the image below, it shows the file located at /home/orin/.ei-linux-runner/models/310628/v15
.
For convenience, you can copy this file to the same directory as the Python program you'll be creating in the next steps. For instance, you can use the following command to copy it to the home directory: cp -v model.eim /home/orin
Now the model is ready to run in a high-level language such as the Python program used in the next step. To ensure this model works, we can run the Edge Impulse Linux Runner with a camera attached to the Orin Nano. You can see a view from the camera via your browser (the IP address location is provided when the Edge Impulse Linux Runner is started). Run this command to start it now: edge-impulse-linux-runner --model-file <path to directory>/model.eim
The inferencing time is around 6ms, which is incredibly fast for object detection projects.
With the impressive performance of live inferencing using the Linux Runner, we can now create a Python-based Traffic Analysis program to calculate cumulative counts, and track the direction and speed of vehicles. This program is a modification of the Classify.py
script from Edge Impulse's examples in the linux-python-sdk
directory. We have adapted it into an object tracking program by integrating a tracking library, which identifies whether the moving object is the same vehicle or a different one by assigning different IDs. This prevents miscounts or double counts.
For speed calculation, we also use this tracking library by adding two horizontal lines on the screen. We measure the actual distance between these lines and divide it by the timestamp of the object passing between the lines. The direction is determined by the order in which the lines are crossed, for example, A —> B is IN, while B —> A is OUT.
In the first code example, we use a USB camera connected to the Orin Nano and run the program with the following command:
If we want to run the program using a video file as input (e.g., video.mp4), we use the path to the video file when executing the program:
Note: For video/camera capture display, you cannot use the headless method from a PC/laptop. Instead, connect a monitor directly to the Orin Nano to view the visuals, including the lines, labeled bounding boxes, IN and OUT counts, and vehicle speeds.
The Python code and the tracking library is avaialable and can be accessed at https://github.com/Jallson/Traffic_Analysis_Orin_Nano
Here are two demo videos, showing the results:
In conclusion, we have successfully implemented an Edge Impulse model using pre-trained NVIDIA TAO - Yolo object detection within a Vehicle Traffic Analysis program, running locally on the Orin Nano. It's important to note that the speed figures provided may not be entirely accurate, as they are based on estimates without on-site measurements. To ensure accuracy, measurements should be taken on-site at the camera deployment location. However, this project serves to simulate a concept that can be further developed. The positions of the lines, distance values, angle settings, and other parameters can be easily adjusted in the Python code to better fit the specific conditions of the environment. Finally, it's worth mentioning that we achieved this with a minimal amount of data, and the low memory requirements of the implemented model result in extremely fast inference times. So, we can confidently say that the project's objectives — to enhance speed, simplify processes, and operate with low energy and cost — have been successfully met, making this method suitable for widespread application.
A wearable surgery inventory object detection sensor, trained with synthetic data created using NVIDIA Omniverse Replicator.
Created By: Eivind Holt
Public Project Link:
GitHub Repo:
This wearable device keeps track of instruments and materials used during surgery. This can be useful as an additional safeguard to prevent Retained Surgical Bodies.
Extensive routines are in place pre-, during, and post-operation to make sure no unintentional items are left in the patient. In the small number of cases when items are left the consequences can be severe, in some cases fatal. This proof-of-concept explores the use of automated item counting as an extra layer of control.
Here is a demo of chrome surgical instrument detection running on an Arduino Nicla Vision:
Existing solutions are mainly based on either x-ray or RFID. With x-ray, the patient needs to be scanned using a scanner on wheels. Metal objects obviously will be visible, while other items such as swabs needs to have metal strips woven to be detected and the surgery team has to wear lead aprons. Some items have passive RFID-circuits embedded and can be detected by a handheld scanner.
NVIDIA GeForce RTX 3090 (any RTX will do)
Formlabs Form 2 3D printer
Surgery equipment
Many operating rooms (OR) are equiped with adjustable lights with a camera embedded. A video feed from such a camera could make an interesting source for the object detection model. This project aims to explore the technical viability of running inference on a small wearable. A fish-eye lens could further extend visual coverage. An important design consideration is to make the wearable operable without the need for touch, to avoid cross-contamination. However, this article is scoped to the creation of an object detection model with synthetic data.
As if detecting objects on highly constrained devices wasn't challenging enough, this use case poses a potential show stopper. Most of the tools used in surgery have a chrome surface. Due to the reflective properties of chrome, especially the high specular reflection and highlights, a given item's features will vary highly judged by its composition of pixels, in this context known as features. Humans are pretty good at interpreting highly reflective objects, but there are many examples where even we may get confused.
Our neural network will be translated into code that will compile and execute on a highly constrained device. One of the limiting factors is the amount of RAM which will directly constrain a number of parameters. In addition to having to keep the images from the camera sensor to a mere 96x96 pixels, there is a limit on the number of classes we can identify. Also, there is a predefined limit of the number of items we can detect in a given frame, set to 10. There is room to experiment with expanding parameters, but it is better to embrace these limiting factors and try to think creatively. For instance, the goal of the device isn't to identify specific items or types of items, but rather to make the surgery team aware if item count doesn't add up. With this approach we can group items with similar shapes and surfaces. Having said that, RAM size on even the smallest devices will certainly increase in the near future. The number of images used for training the model does not affect memory usage.
Edge Impulse Studio offers a web-based development platform for creating machine learning solutions from concept to deployment.
Using the camera on the intended device for deployment, the Arduino Nicla Vision, around 600 images were initially captured and labeled. Most images contained several items and a fraction were unlabeled images of unrelated background objects.
The model trained on this data was quickly deemed useless in detecting reflective items but provided a nice baseline for the proceeding approaches.
To isolate the chrome surfaces as a problematic issue, a number of chrome instruments were spray painted matte and a few plastic and cloth based items were used to make a new manually captured and labeled dataset of the same size. For each image the items were scattered and the camera angle varied.
A video demonstrating live inference from the device camera can be seen here. Only trained objects are marked. Flickering can be mitigated by averaging.
Matte objects detection demo, video Eivind Holt
The remainder of the article answers the question whether highly reflective objects can be reliably detected on constrained hardware given enough training data.
A crucial part of any ML-solution is the data the model is trained, tested and validated on. In the case of a visual object detection model this comes down to a large number of images of the objects to detect. In addition each object in each image needs to be labeled. Edge Impulse offers an intuitive tool for drawing boxes around the objects in question and to define labels. On large datasets manual labeling can be a daunting task, thankfully EI offers an auto-labeling tool. Other tools for managing datasets offer varying approaches for automatic labeling, for instance using large image datasets. However, often these datasets are too general and fall short for specific use cases.
One of the main goals of this project is to explore creating synthetic object images that come complete with labels. This is achieved by creating a 3D scene in NVIDIA Omniverse and using it's Replicator Synthetic Data Generation toolbox to create thousands of slightly varying images, a concept called domain randomization. With a NVIDIA RTX 3090 graphics card from 2020 it is possible to produce about 2 ray-traced images per second. Thus, creating 10,000 images would take about 5 hours.
We will be walking through the following steps to create and run an object detection model on a microcontroller devkit. An updated Python environment with Visual Studio Code is recommended. A 3D geometry editor such as Blender is needed if object 3D models are not in USD-format (Universal Scene Description).
Installing Omniverse Code, Replicator and setting up debugging with Visual Studio Code
Creating a 3D stage/scene in Omniverse
Working with 3D models in Blender
Importing 3D models in Omniverse, retaining transformations, applying materials
Setting metadata on objects
Creating script for domain randomization
Creating label file for Edge Impulse Studio
Creating an object detection project in Edge Impulse Studio and uploading dataset
Training and deploying model to device
3D printing a protective housing for the device
Using object detection model in an application
NVIDIA Omniverse
Install Code: Open Omniverse Launcher, go to Exchange, install Code.
Launch Code from NVIDIA Omniverse Launcher.
Go to Window->Extensions and install Replicator
Create a new stage/scene (USD-file)
Create a textured plane that will be a containment area for scattering the objects
Create a larger textured plane to fill the background
Add some lights
If you have a hefty heat producing GPU next to you, you might prefer to reduce the FPS limit in the viewports of Code. It may default to 120 FPS, generating a lot of heat when the viewport is in the highest quality rendering modes. Set "UI FPS Limit" and "Present thread FPS Limit" to 60. This setting unfortunately does not persist between sessions, so we have to repeat this everytime projects open.
A scene containing multiple geometric models should be exported on an individual model basis, with USD-format as output.
Omniverse has recently received limited support in importing BSDF material compositions, but this is still experimental. In this project materials or textures were not imported directly.
To avoid overwriting any custom scaling or other transformations set on exported models it is advisable to add a top node of type Xform on each model hierarchy. Later we can move the object around without loosing adjustments.
The replicator toolbox has a function for scattering objects on a surface in it's API. To (mostly) avoid object intersection a few improvements can be made. In the screenshot a basic shape has been added as a bounding box to allow some clearance between objects and to make sure thin objects are more appropriately handled while scattering. The bounding box can be set as invisible. As of Replicator 1.9.8 some object overlap seems to be unavoidable.
For the cloth based materials some of the textures from the original models were used, more effort in setting up the shaders with appropriate texture maps could improve the results.
To be able to produce images for training and include labels we can use a feature of Replicator toolbox found under menu Replicator->Semantics Schema Editor.
Here we can select each top node representing an item for object detection and adding a key-value pair. Choosing "class" as Semantic Type and e.g. "tweezers" as Semantic Data enables us to export these strings as labels later. The UI could benefit from a bit more exploration in intuitive design, as it is easy to misinterpret what fields shows the actual semantic data set on an item, an what fields carry over intended to make labeling many consecutive items easier.
Semantics Schema Editor may also be used with multiple items selected. It also has handy features to use the names of the nodes for automatic naming.
To keep the items generated in our script separate from the manually created content we start by creating a new layer in the 3D stage:
Next we specify that we want to use ray tracing as our image output. We create a camera and hard code the position. We will point it to our items for each render later. Then we use our previously defined semantics data to get references to items, background items and lights for easier manipulation. Lastly we define our render output by selecting the camera and setting the desired resolution. Note that the intended resolution of 96x96 pixels seem to produce artifacts, so we set it a bit higher, at 128x128 pixels. Edge Impulse Studio will take care of scaling the images to the desired size.
Due to the asynchronous nature of Replicator we need to define our randomization logic as call-back methods by first registering them in the following fashion:
Before we get to the meat of the randomization we define what will happen during each render:
num_frames
defines how many renders we want. rt_subframes
lets the render pipeline proceed a number of frames before capturing the result and passing it on to be written to disk. Setting this high will let advanced ray tracing effects such as reflections have time to propagate between surfaces, though at the cost of higher render time. Each randomization sub-routine will be called, with optional parameters.
To write each image and sematic information to disk we use a provided API. We could customize the writer but as of Replicator 1.9.8 on Windows this resulted in errors. We will use "BasicWriter" and rather make a separate script to produce a label format compatible with EI.
Here rgb
tells the API that we want the images to be written to disk as png-files, bounding_box_2d_tight
that we want files with labels (from previously defined semantics) and bounding boxes as rectangles. The script ends with running a single iteration of the process in Omniverse Code, so we can visualize the results.
The bounding boxes can be visualized by clicking the sensor widget, checking "BoundingBox2DTight" and finally "Show Window".
Only thing missing is defining the randomization logic:
For scatter_items
we get a reference to the area that will contain our items. Each item is then iterated so that we can add a random rotation (0-360 degrees on the surface plane) and use scatter_2d
to randomize placement. For the latter, surface_prims
takes an array of items to use as possible surfaces, check_for_collisions
tries to avoid overlap. The order of operations is important to avoid overlapping items.
For the camera we simply randomize the position in all 3 axis and make sure it points to the center of the stage.
With the lights we randomize the brightness between a set range of values.
Note that in the provided example rendering images and labels is separated between the actual objects we want to be able to detect and background items for contrast. The process would run once for the surgery items, then the following line would be changed from
to
When rendering the items of interest the background items would have to be hidden, either manually or programatically, and vice versa. The output path should also be changed to avoid overwriting the output.
Whether the best approach for training data is to keep objects of interest and background items in separate images or to mix them is debated, both with sound reasoning. In this project the best results were achieved by generating image sets of both approaches.
or debug from Visual Studio Code by setting input folder in launch.json
like this:
This will create a file bounding_boxes.labels
that contains all labels and bounding boxes per image.
For a project intended to detect objects with reflective surfaces a large number of images is needed for training, but the exact number depends on a lot of factors and some experimentation should be expected. It is advisable to start relatively small, say 1,000 images of the objects to be detected. For this project over 30,000 images were generated; this is much more than needed. A number of images of random background items are also needed to produce results that will work in the real world. This project uses other surgery equipment for convenience, they do not need to be individually labeled. Still Edge Impulse Studio will create a labeling queue for each image for which it has not received labeling data. To avoid having to click through each image to confirm they contain no labels, the program described will produce a bounding_boxes.labels with empty labels for items tagged with semantic class "background". The factor between images of items to detect and background noise also relies on experimentation, but 1-2% background ratio seems to be a good starting point.
EI creates unique identifiers per image, so you can run multiple iterations to create and upload new datasets, even with the same file names. Just upload all the images from a batch together with the bounding_boxes.labels
file.
This way we can effortlessly produce thousands of labeled images and witness how performance on detecting reflective objects increases. Keep in mind to try to balance the number of labels for each class.
To protect the device and make a simple way to wear it a housing was designed in CAD and 3D printed. It is a good practice to start by making basic 3D representations of all the components, this vastly reduces iterations due to surprises when it comes to assembly.
The results of this project show that training and testing data for object detection models can be synthesized using 3D models, reducing manual labor in capturing images and annotation. Even more impressive is being able to detect unpredictable reflective surfaces on heavily constrained hardware by creating a large number of images.
The domain of visual object detection is currently experiencing a thrilling phase of evolution, thanks to the convergence of numerous significant advancements. Envision a service capable of accepting 3D models as input and generating a diverse array of images for training purposes. With the continuous improvements in generative diffusion models, particularly in the realm of text-to-3D conversion, we are on the cusp of unlocking even more potent capabilities for creating synthetic training data. This progression is not just a technological leap; it's set to revolutionize the way we approach object detection, paving the way for a new generation of highly innovative and effective object detection solutions. The implications of these advancements are vast, opening doors to unprecedented levels of accuracy and efficiency in various applications.
I highly recommend learning how to debug Omniverse extension code. It requires a bit of work, but it will save a lot of blind troubleshooting as things get complex. Note: This procedure is for debugging extensions.
To enable Python debugging via Visual Studio Code, in Omniverse Code, go to Extensions.
Search for "debug" and enable "Kit debug vscode" and "A debugger for Python".
In Code, the window "VS Code Link" should read "VS Code Debugger Unattached".
After activating the project extension, go to the extension details and click "Open in VSCode" icon.
In Visual Studio Code, make sure in .vscode\launch.json
the two settings corresponds to what you see in the "VS Code Link" window, e.g. "host": "localhost", and "port": 3000.
Go to the Run and Debug pane in VSCode, make sure "Python: Attach .." is selected and press the play button.
Back in Omniverse Code, VS Code Link should read "VS Code Debugger Attached".
To test, in VSCode set a breakpoint in exts\eivholt\extension.py
, e.g. inside the function "run_replicator".
Back in Omniverse Code, find the project extension UI and click "Initialize Replicator".
In VSCode, you should now have hit the breakpoint.
Another interesting approach to the challenge of detecting reflective surfaces is using edge detection. This would still benefit from synthetic data generation.
A robotic system for efficient object sorting and placement in dynamic environments, using computer vision to guide the robotic arm.
Created By: Naveen Kumar
Public Project Link:
GitHub Repository:
In this project, we will design and implement a system capable of performing pick-and-place tasks using a robot arm and a 3D depth camera. The system can recognize and locate objects in a cluttered and dynamic environment, and plan and execute grasping and placing actions. The system consists of the following components:
A 3D camera that can capture images of the scene and provide 3D information about the objects and their poses.
A robot arm that can move and orient its end-effector according to the desired position and orientation.
A gripper that can attach and detach objects of various shapes and sizes.
A control system that can process the 3D images, perform object recognition and localization, plan the grasping and placing strategies, and control the robot arm and the gripper.
The system can be used for various pick-and-place applications, such as bin picking, assembly, sorting, or packaging. The system can also be adapted to different scenarios by changing the camera, the robot arm, the gripper, or the software. The system can provide flexibility, accuracy, and efficiency for industrial or domestic tasks. This project might seem simple at first glance, but is surprisingly complex. We will be utilizing plastic toys to sort them. Sorting is a crucial task, from manufacturing to logistics, and requires a great deal of precision and attention to detail. By using these plastic toys, we will be able to test and refine our sorting techniques in a safe and controlled environment.
A Raspberry Pi 5 will be used as a main controller, to host ROS 2 nodes and an interface between the robotic arm and the depth camera.
Instead of sticking with the same old boring color cubes 🧊 that you see everywhere online for a pick-and-place demo, we’re going to have some fun sorting through these colorful plastic toys, Penguins 🐧 and Pigs 🐷!
After the installation is completed, we can insert the SD card back into the kit and power it on. Once it boots up, we can log in via ssh.
The Robot Operating System (ROS) is a set of software libraries and tools for building robot applications. We will use ROS 2 Humble for this project since it is stable on the Raspberry Pi OS. The ROS 2 binary packages are not available for Raspberry Pi OS, so we need to build it from the source. Please follow the steps below to install it.
Make sure we have a locale that supports UTF-8
.
Otherwise, run the following command to open the Raspberry Pi Configuration CLI:
Under Localisation Options
> Local
, choose en_US.UTF-8
.
MoveIt 2 is the robotic manipulation platform for ROS 2 and incorporates the latest advances in motion planning, manipulation, 3D perception, kinematics, control, and navigation. We will be using it to set up the robotic arm and the motion planning.
DepthAI ROS is a ROS 2 package that allows us to:
Use the OAK-D camera as an RGBD sensor for the 3D vision needs.
Load Neural Networks and get the inference results straight from the camera.
The following script will install depthai-core, update USB rules, and install depthai device drivers.
Execute the following commands to set up a DepthAI ROS 2 workspace.
The micro-ROS stack integrates microcontrollers seamlessly with standard ROS 2 and brings all major ROS concepts such as nodes, publishers, subscriptions, parameters, and lifecycle onto embedded systems. We will use micro-ROS on the Arduino Nano RP2040 Connect mounted on the Braccio Carrier board. The Arduino Nano RP2040 will publish the joint states and subscribe to the arm manipulation commands. It will communicate to ROS 2 on the Raspberry Pi 5 over serial port transports.
The micro-ROS agent is a ROS 2 node that receives and sends messages from micro-ROS nodes and keeps track of the micro-ROS nodes, exposing them to the ROS 2 network. Execute the following command to install the micro-ROS agent on the Raspberry Pi 5.
We captured 101 images of the pigs and penguins using the OAK-D camera and uploaded them to Edge Impulse Studio.
We can see the uploaded images on the Data Acquisition page.
We can now label the data using bounding boxes in the Labeling Queue tab, as demonstrated in the GIF below.
To create an Impulse, follow these steps:
Go to the Impulse Design section, then select the Create Impulse page. We have opted for a 320x320 pixel image size in the "Image Data" form fields to achieve better accuracy.
Click on "Add a processing block" and choose "Image". This step will pre-process and normalize the image data while also giving us the option to choose the color depth.
Click on "Add a learning block" and choose "Object Detection (Images)".
Finally, click on the "Save Impulse" button to complete the process.
On the Image page, choose RGB as color depth and click on the Save parameters button. The page will be redirected to the Generate Features page.
Now we can initiate feature generation by clicking on the Generate features button. Once the feature generation is completed, the data visualization will be visible in the Feature Explorer panel.
Go to the Object Detection page, then click "Choose a different model" and select the YOLOv5 model. There are 4 variations of the model size available, and we selected the Nano version with 1.9 million parameters. Afterward, click the "Start training" button. The training process will take a few minutes to complete.
Once the training is completed we can see the precision score and metrics as shown below.
On the Model testing page, click on the "Classify All" button which will initiate model testing with the trained float32 model. The testing accuracy is 100%.
To verify the model, we will run the inferencing on the Raspberry Pi 5 (CPU) before deploying it to the OAK-D device. Execute the following commands to install the Edge Impulse Linux Runner.
Execute the following commands to use the OAK-D as a USB webcam for the Edge Impulse Linux Runner.
To download the eim
model and start the inferencing, run the following command and follow the instructions.
We can see the inferencing output on the web browser. Also, we can monitor the terminal logs.
To allow DepthAI to use our custom-trained model, we need to convert them into a MyriadX blob file format so that they are optimized for the Movidius Myriad X processor on the OAK-D.
The Edge Impulse Studio helps us save a step by providing the ONNX format for the trained YOLOv5 model that we can download from the project's Dashboard page.
We will utilize the OpenVINO model optimizer for conversion on an x86 Linux machine. OpenVINO is an open-source software toolkit for optimizing and deploying deep learning models. Execute the following commands to install all prerequisites for the conversion process.
Decoding a custom YOLOv5 model on the device is not simple. We need to add a few operations to the nodes in the exported ONNX file and then prune the model. The following Python script automates this process.
The ONNX model can be large and architecture-dependent. For the on-device inferencing, we need to convert the model to the OpenVINO Intermediate Representation (IR) format which is a proprietary model format of OpenVINO. The model conversion API translates the frequently used deep learning operations to their respective similar representation in OpenVINO and tunes them with the associated weights and biases from the trained model. The resulting IR contains two files:
.xml
- Describes the model topology.
.bin
- Contains the weights and binary data.
Execute the following command to generate the IR files.
After converting the model to OpenVINO’s IR format, run the following script to compile it into a .blob
file, which can be deployed to the OAK-D device.
This will create the ei-pnp_yolov5n_320_openvino_2022.1_6shave.blob file in the IR directory. We should copy this blob file to the ~/EI_Pick_n_Place/pnp_ws/src/ei_yolov5_detections/resources
folder on the Raspberry Pi 5. We can test the generated model using the depthai-python library:
The Python script can be found in the GitHub repository:
https://github.com/metanav/EI_Pick_n_Place/blob/main/pnp_ws/src/ei_yolov5_detections/src/ei_yolov5_spatial_stream.py
Take a look at the GIF below, which displays the RGB and spatial depth detections side by side. The RGB detections indicate the 3D location (X, Y, Z) with bounding boxes, while the depth image shows the bounding boxes with a 25% scale factor for accurate object localization. For depth (Z), each pixel inside the scaled bounding box (ROI) is taken into account. This gives us a set of depth values, which are then averaged to get the final depth value. Also, the depth image is wider than the RGB image because they have different resolutions.
We created a ROS 2 package moveit_resources_braccio_description
to keep all STL files and URDF for reusability. The robot model URDF can be found in the GitHub repository for this project:
https://github.com/metanav/EI_Pick_n_Place/tree/main/pnp_ws/src/braccio_description/urdf
We can verify if the URDF is functioning as expected by publishing simulated joint states and observing the changes in the robot model using the RViz 2 graphical interface. Execute the following commands to install the urdf_launch
and joint_state_publisher
packages and launch the visualization.
By adjusting the sliders for the joints, we can observe the corresponding changes in the robot model.
The MoveIt Setup Assistant 2.0 is a GUI for configuring the manipulator for use with MoveIt 2. Its primary function is generating a Semantic Robot Description Format (SRDF) file for the manipulator, which specifies additional information required by MoveIt 2 such as planning groups, end effectors, and various kinematic parameters. Additionally, it generates other necessary configuration files for use with the MoveIt 2 pipeline.
To start the MoveIt Setup Assistant 2.0, execute the commands below.
Click on the Create New MoveIt Configuration Package and provide the path of the braccio.urdf
file from the moveit_resources_braccio_description
package.
To generate the collision matrix, select the Self-Collisions pane on the left-hand side of the MoveIt Setup Assistant and adjust the self-collision sampling density. Then, click on the Generate Collision Matrix button to initiate the computation. The Setup Assistant will take a few seconds to compute the self-collision matrix, which involves checking for pairs of links that can be safely disabled from collision checking.
We will define a fixed
virtual joint that attaches the base_link
of the arm to the world
frame. This virtual joint signifies that the base of the arm remains stationary in the world frame.
Planning groups in MoveIt 2 semantically describe different parts of the robot, such as the arm or end effector, to facilitate motion planning.
The Setup Assistant allows us to add predefined poses to the robot’s configuration, which can be useful for defining specific initial or ready poses. Later, the robot can be commanded to move to these poses using the MoveIt API. Click on the Add Pose and choose a name for the pose.
The robot will be in the default pose, with all joints set to their zero values. Move the individual joints around until we find the intended pose and then Save the pose.
Now we can designate the braccio_gripper
group as an end effector. The end effectors can be used for attaching objects to the arm while carrying out pick-and-place tasks.
Now we should build and upload the firmware to the Arduino Nano RP2040 connect. During startup, the application attempts to connect to the micro-ROS agent on the Raspberry Pi 5 over serial port transports. It then initiates a node that publishes real-time states of the robotic arm joints to the /joint_states
topic and subscribes to the /gripper/gripper_cmd
and /arm/follow_joint_trajectory
topics.
We should launch the ROS 2 nodes on separate terminals on the Raspberry Pi 5 by executing the following commands step-by-step.
Launch micro-ROS agent
The micro-ROS agent exposes the publishers and action server running on the Braccio ++ MCU to ROS 2.
Launch ei_yolov5_detections node
The ei_yolov5_detections
node detects the objects and publishes the detection results using the Edge Impulse trained model on the OAK-D depth camera.
We can check the spatial detection message as follows.
Launch pick_n_place node
This node subscribes to the /ei_yolov5/spatial_detections
topic and plans the pick and place operation. While bringing up this node, we need to provide command line parameters for the exact (X, Y, Z) position of the camera in meters from the base of the robot.
The launch file also brings up the robot_state_publisher
and move_group
nodes to publish the robot model and provide MoveIt 2 actions and services respectively.
Launch RViz 2
We can see the real-time motion planning solution execution visualization using the RViz 2.
This project successfully demonstrates the design and implementation of a sophisticated pick-and-place system using a robot arm equipped with a 3D depth camera. The system's ability to recognize and locate objects in a cluttered and dynamic environment, coupled with its precise grasping and placing actions, showcases its potential for various industrial and domestic applications. This project underscores the complexity and importance of sorting tasks in various sectors, from manufacturing to logistics, and demonstrates how advanced robotic systems can meet these challenges with high efficiency and accuracy.
Counting objects with computer vision, using the Avnet RZBoard V2L and it's Renesas RZ/V2L with DRP-AI accelerator.
Created By: David Tischler
Public Project Link:
Large scale production and manufacturing operations rely on effective and accurate inventory and product counting, so that businesses have accurate and known quantities of products to fulfill orders, ship to retailers, and plan their finances accordingly. In typical scenarios, business have up-to-date counts of inputs such as supplies and raw materials, partially completed products that are currently being worked on, and finished goods ready for distribution. To alleviate the burden of counting the units in each stage by hand, which could possibly be very time-consuming, computer vision can be used to identify and quantify parts, supplies, or products instead.
There are two distinct counting operations to consider. The first is a "total quantity" at any given time, such as "there are 8 objects on the assembly line at this exact moment". The second scenario is a value that includes a time factor, for example, "14 items moved from point A to point B along the conveyor belt since we began work this morning." Each of these counts are important, so we'll cover both of them here. First, we'll perform a count of items detected in a camera frame, then we will explore how to count the total number of objects that moved past a stationary camera placed above a conveyor belt.
A machine learning model that recognizes a distinct item will be needed, along with the camera and hardware.
The Renesas RZ/V2L SoC contains two 1.2GHz Arm® Cortex®-A55 cores for running Linux, a 200MHz Cortex-M33 core for RTOS or other microprocessor applications, and very important for this use-case, a DRP-AI machine learning model accelerator.
With all of the connectivity, memory, storage, and compute power the RZBoard V2L contains, it is a very capable and highly efficient platform for AI projects.
For ease of testing, we'll use small conveyor belt to prototype the system so that objects pass into and out of the camera's field of view. This way we can test both scenarios as mentioned: the number of objects in view at a distinct moment, and the total count of objects that have moved past the camera.
USB Webcam
HDMI monitor, keyboard, mouse
Conveyor belt, motor, power supply
M5 hex nuts (this is the object I will be detecting, but you can choose something else)
Edge Impulse
Updated RZBoard OS
The first step in our machine learning workflow is data collection. In this example, we are going to identify, and count, some small M5 hex bolts traveling down a conveyor belt. I've used M5 hex nuts due to their convenient size, but you could use any object. To build a model that can identify a hex nut, we need to first take pictures of hex nuts and label them accordingly. Knowing that a USB camera is going to be hooked up to the RZBoard and placed above the conveyor belt, I have (for now) connected the same camera directly to my laptop in order to capture representative images. This allows me to gather pictures of the M5 nuts from the same angle, distance, and lighting as what we will experience once the model is deployed to the RZBoard.
Log in to Edge Impulse, click on Create Project, and provide a name for your project. Next, click "Data acquisition" on the left, and then click on "Connect to your computer". A new tab or window will open, with the ability to take pictures by clicking the "Capture" button. Images collected will be automatically added to your dataset. I was able to select the camera I wanted to use in my browser settings, and I re-positioned the M5 bolts, moved the conveyor a bit, rotated the bolts, and varied the lighting in order to build a robust collection of images.
Next, we need to label the objects in each image. This locates where the objects of interest are in each picture, which will be used when training the model. Click on "Labeling Queue" at the top, and draw a bounding box around each bolt in the picture, and give it a label. I simply entered m5_nut
on mine, though yours could vary. Click on "Save labels" to advance to the next image in the dataset, and you will notice that the bounding boxes will follow through to the next picture, making this process quick and easy. Once complete, you can click on "Dataset" to return to the summary list of data elements. You can click on them if you'd like to inspect them closer, but they should be ready for use at this point.
After the images have all been labeled, it is time to move on to the machine learning model creation phase. Click on "Impulse design" on the left, and you will see 4 columns (2 of which are empty for the moment), that will make up the machine learning pipeline. The first column is the input, which should be pre-populated with "Image data". You can however, increase the image height and width to 320 by 320, as the RZBoard will have plenty of processing power available to make use of the larger image size (more on that in a bit). In the second column, click "Add a processing block", and choose "Image" by clicking "Add". In column 3, click "Add a learning block", and choose "YOLOv5 for Renesas DRP-AI" by clicking "Add". Finally, the fourth column should be pre-populated as well, with only one Output feature, the label we created earlier called m5_nut
. Click "Save Impulse".
On the left, click "Image", and we'll configure the Image Processing Block. Here you can review the Raw features, switch to grayscale to save some memory on lower power devices or those with grayscale image sensors (not necessary in this case), and review the DSP results. We won't make any changes, so click "Save parameters". It will automatically move to the "Generate features" page, and here you can click the "Generate features" button to create a visualization of the analyzed features. With only one class in this project, there should be a nice cluster of data points represented, though the clustering is a bit easier to comprehend or represent visually when multiple labels / objects are used in a dataset.
Next, click on "YOLOv5 for Renesas DRP-AI" on the left navigation to go to the Neural Network Settings page. You can leave the default selections alone, but do check to make sure that the Target is set to "Renesas RZ/V2L with DRP-AI accelerator" in the top-right corner, for more accurate inference time and memory usage estimations, and also double check that the Neural network architecture is correctly set to "Renesas / YOLOv5 for Renesas DRP-AI", then click "Start training". This will take a short while to iterate through each epoch, and at the end of the process you should get back a Precision score and estimated inference time and memory usage.
In order to get the model (and eventually our counting application) onto the RZBoard V2L, we have a bit of prep work to do. The RZBoard comes from the factory with an operating system and sample application installed on it's eMMC, which is nice for an immediate way to get started with the board and a great out-of-the-box experience, but won't work for our purposes here. Instead, we need a version of the Yocto OS that includes nodejs
and npm
, so that we can install the Edge Impulse Linux Runner. You could go down the path of building Yocto yourself (I tested it, and it does work fine), but to save you the trouble Avnet has already gone ahead and built one, that you can find in their Sharepoint site here:
The name of the file you need to download is avnet-core-image-rzboard-20230124105646.rootfs.wic
. (If you enter the folder at the top-most driectory, navigate into the "images" folder to find it there). Download that file, and flash it directly to an SD Card. Now, on the RZBoard, you'll need to flip a small DIP-switch that tells the board to boot from SD Card instead of the eMMC storage. Look for two tiny switches near the headphone jack, and make sure they are both flipped away from the headphone jack, facing the silkscreened 1
and 2
markings on the switch. Here is an example of how mine looks:
Once this is done, insert the SD Card, plug in USB-C power, attach an HDMI monitor, USB webcam, and USB keyboard/mouse, then power on the board by pressing the power button near the USB-C power supply.
Once booted up, you can open a terminal session by clicking on the top left icon, and we'll need to connect the board to WiFi. To do that, enter:
NOTE: You can also attach a serial console and use Putty or a similar terminal application, if that's easier for you.
We'll also need to expand the available space on the SD Card, so enter:
Enter p
to print the current partition information, and make note of the mmcblk0p2
start address displayed on the screen. We'll need that in a moment (mine was 204832
). Follow the series of commands below to [p] print the partition info, [d] delete the current second [2] partition, make a [n] new [p] primary second [2] partition, and type in the start address you discovered a moment ago and press [enter]. Then press [enter] again on the next question to accept the default end address, [N] to not remove the signature, and [w] to write the changes to disk. The chain of commands is thus:
Finally, run this to expand the drive:
At this point, we are ready to install the Edge Impulse Linux tooling, which can be done with:
Once completed, we can test out that everything works thus far, by running:
You will be asked for your username and password, and the project name to connect to, then a download and build will run to get the model ready for use on the RZBoard. Once complete, local inferencing will start, and results will be printed to the command line. Make note of the final model location just before the inferencing starts, we'll use that later on (mine was /root/.ei-linux-runner/models/315846/v15/model.eim
). You can also load http://:4912 in a browser on your development machine, to get a view from the camera with any detected objects outlined by bounding boxes. Before we move on to building our object counting application, let's highlight an important item here. My inference time as you can see below is approximately 8ms to 10ms, so roughly 100 inferences per second - incredible performance. The web view of the camera however, provides a slow frame rate: The reason is that the sample webserver sending the camera view is not really optimized, and the WiFi latency itself is also at play here. A compiled binary version of an application is much more responsive.
For our first counting task, we'll quantify the number of objects detected within a camera frame. To do this, we'll put together a bit of python code to run on the RZBoard - but this will also require some more tooling and dependencies to get installed. In that same terminal session you already have running, enter the following series of commands to install portaudio
, pip
, a few dependencies, edge_impulse_linux
, and set the Display variable. You could probably wrap this all up into a shell command to run in bulk, but here are the individual commands to run:
Now we can place our python counting application on the RZBoard, by entering nano unique_count.py
and then pasting in the following python snippet (it might be faster to copy/paste this snippet into a file on your desktop/laptop, and then copy the file directly on to the RZBoard's SD card, or use a serial console so that you can copy/paste from host to device, instead of typing this all in to that terminal window directly on the RZBoard).
Be sure to update line 8 with the location of your model file on the RZBoard. We determined that mine was /root/.ei-linux-runner/models/315846/v15/model.eim
earlier when we ran the Linux Runner the first time. Finally, it is time to test out the counter, simply run python3 unique_count.py
, and in the terminal you will see the number of detected M5 nuts that are in view of the camera printed out to the console.
I was able to then use the conveyor belt and observe the quantity increase and decrease as the bolts moved down the line and entered / exited the field of view of the camera.
Now we can move on to our second counting application, which totals up the number of M5 hex nuts that pass in front of the camera over a period of time. For that, we'll use a second python snippet, and this time we'll render the camera view on screen so we can have a look at what the camera is seeing. Create a new file with nano total_count.py
and paste in the following snippet:
Like the first application, it might be easier to use a serial console or just copy the file directly onto the RZBoard's SD Card from your development machine, to save the typing.
For this application, we'll need to append the model file to use to the command, so from the terminal session on the RZBoard run:
In my case, this means I entered python3 total_count.py /root/.ei-linux-runner/models/315846/v15/model.eim
It will take a moment for the camera view to appear on screen, and it should be noted that once again the framerate here is not optimized, as we are running non-threaded single core python, and compiled code is much faster. But for purposes of demonstating how to acheive the counting, this will work. I once again turned on the conveyor belt, and as M5 hex nuts travel past the camera, the count increases by one in the rendered camera view window. My model could probably benefit from some extra images added to my dataset, as I did have a few sneak by undetected, so perhaps 60 images was not quite enough in the training dataset, but we have proven the concept works!
The Avnet RZBoard with it's Renesas RZ/V2L SoC and DRP-AI Accelerator made prototpying our computer vision counting applications quick and easy, and demonstrated excellent performance with inference times in the 9ms range!
Control your TV, Air Conditioner or Lightbulb by just pointing your finger at them, using the BrainChip AKD1000 achieving great accuracy and low power consumption.
Created By: Christopher Mendez
Public Project Link:
Today more than ever we live with smart devices and personal assistants that work together to make our environment a more comfortable, efficient and personalized place. This project aims to contribute to the same field by suggesting a radical change in how we interact with smart things.
Sometimes it can be inconvenient to have to ask a personal assistant to turn our appliances on or off. Because it's simply too late at night to be talking, or because we're watching our favorite movie and we don't want annoying audio interrupting us.
This is why I thought "What if we could control the whole house with just gestures?" It would be amazing to just point to the air conditioner and turn it on, turn off the light, and turn on our TV.
To develop this project we will use a BrainChip Akida Development Kit and a Logitech BRIO 4K Webcam, together with an Edge Impulse Machine Learning model for pose identification.
It should be noted that this kit is the main component of this project thanks to some interesting characteristics that make it ideal for this use case. This kit consists of a Raspberry Pi Compute Module 4 with Wi-Fi and 8 GB RAM, also its IO Board, which includes a PCIe interface to carry an Akida PCIe board with the AKD1000 Neuromorphic Hardware Accelerator.
Considering that our project will end up being one more smart device that we will have at home, it's crucial that it can do its job efficiently and with very low energy consumption. This is where BrainChip's technology makes sense. Akida™ neuromorphic processor mimics the human brain to analyze only essential sensor inputs at the point of acquisition—processing data with unparalleled performance, precision, and economy of energy.
The whole system will be running independently identifying poses, if a desired pose is detected it will send an HTTP request to the Google Assistant SDK being hosted by a Raspberry Pi with Home Assistant OS.
The system comes with the basic requirements installed to run machine learning models using Akida processor acceleration. Once the system is powered up and connected to the internet (I used an ethernet cable), you can access it by an SSH connection: you will need to know the device's local IP address, in my case, I got it from the list of connected devices of my router.
To verify the device is working properly, you can try an included demo by navigating to http://<your_kit_IP>, in my case to http://10.0.0.150 and try some of the examples:
To start setting up the device for a custom model deployment, let's verify we have installed all the packages we need.
I am using Putty for the SSH connection. Log in using the Administrator credentials, in this case, the username is ubuntu and the password is brainchip.
Once inside you will be able to install some required dependencies:
Running the built-in demos ensures us that the system already recognizes the Akida package and the PCIe drivers for the AKD1000, but we can verify it by running the following commands:
The last command should return the node version, v14 or above.
As we are working with computer vision, we will need "opencv-python>=4.5.1.48, "PyAudio", "Psutil", and "Flask"
For the creation of the dataset of our model, we have two options, uploading the images from the BrainChip Development Kit or using our computer or phone. In this case, I chose to take them from the computer using the same webcam that we are finally going to use in the project.
The dataset consists of 3 classes in which we finger point each appliance and a last one of unknown cases.
Taking at least +50 pictures of each class will let you create a robust enough model
After having the dataset ready, it is time to define the structure of the model.
In the left side menu, we navigate to Impulse design > Create impulse and define the following settings for each block, respectively:
Image width: 192
Image height: 192
Resize mode: Fit longest
Use this block to turn raw images into pose vectors, then pair it with an ML block to detect what a person is doing.
To classify the features extracted from the different poses, we'll use a classification learn block specifically designed for the hardware we're using.
Finally, we save the Impulse design, it should end up looking like this:
After having designed the impulse, it's time to set the processing and learning blocks. The Pose estimation block doesn't have any configurable parameters, so we just need to click on Save parameters and then Generate features.
In the classifier block define the following settings:
Number of training cycles: 100
Learning rate: 0.001
In the Neural network architecture, add 3 Dense layers with 35, 25 and 10 neurons respectively.
Here is the architecture "Expert mode" code (you can copy and paste it from here):
Click on the Start training button and wait for the model to be trained and the confusion matrix to show up.
The results of the confusion matrix can be improved by adding more samples to the dataset.
Install all the project requirements with the following command, and wait for the process to be done.
Install these other required packages with:
Once the project is cloned locally in the Akida Development Kit, you can download the project model from Edge Impulse Studio by navigating to the Dashboard section and downloading the MetaTF .fbz
file.
Once downloaded, from the model path, open a new terminal and copy the model to the Dev Kit using scp
command as follows:
You will be asked for your Linux machine login password.
Now, the model is on the Akida Dev Kit local storage (/home/ubuntu)
and you can verify it by listing the directory content using ls
.
Move the model to the project directory with the following command:
Here we have the model on the project directory, so now everything is ready to be run.
To run the project, type the following command:
The first parameter class-pose.py
is the project's main script to be run.
akida_model.fbz
is the Meta TF model name we downloaded from our Edge Impulse project.
0
force the script to use the first camera available.
The project will start running and printing the inference results continuously in the terminal.
To watch a preview of the camera feed, you can do it by opening a new ssh
session and running the make-page.py
script from the project directory:
Finally, you will be able to see the camera preview alongside the inference results organized in the following order: AC
, Light
, Other
and TV
.
The Home Assistant is running on a separate Raspberry PI.
Once the integration is set, we can send HTTP
requests to it with the following format:
URL: http://<Raspberry Pi IP>:8123/api/services/google_assistant_sdk/send_text_command
Headers:
Authorization: "Bearer "
Content-Type: "application/json"
Body: {"command":"turn on the light"}
You must edit the url
and auth
variables in the code with the respective ones of your setup.
Here I show you the whole project working and controlling appliances when they are pointed.
This project leverages the Brainchip Akida Neuromorphic Hardware Accelerator to propose an innovative solution to home automation. It can be optimized to work as a daily used gadget that may be at everyone's house in the near future.
How to create computer vision models with the Edge Impulse Studio, and then transform those models into a format that is usable by Nvidia DeepStream applications.
Created By: Peter Ing
GitHub Repo:
Nvidia DeepStream is an audio, video, and image analytics toolkit based on GStreamer for AI and computer vision processing. It can be used as a building block for creating applications and services that incorporate machine learning and computer vision components or capabilities, such as human-machine interfaces, self-driving and autonomous vehicles, security, and many other use-cases. Detailed information and documentation for DeepStream is located here: .
Be sure to check out the official blog post over on the Nvidia Developer blog, as well:
The following general workflow can be followed when using Edge Impulse with Nvidia DeepStream to produce the final ONNX model and Inference Plugin Configuration files for use with your preferred deployment option.
The first step in the process is exactly the same as when you are using Edge Impulse to build TinyML models for embedded systems, up until the last stage of testing and validation.
The process above varies depending on the type of model being used. One difference between using DeepStream vs the lower level TensorRT API is that DeepStream provides convenience for developers, with a higher level interface taking away the need to work directly with the more complex TensorRT. This convenience does come with a constraint on the model's input layer being used with DeepStream.
With TensorRT you can effectively work more freely using the Layers API, which adds more development overhead, whereas with DeepStream you need to ensure your ONNX Model is built with the input layer in a specific manner.
At runtime, DeepStream's Inference plugin, Gst-nvinfer
, automatically transforms input data to match the model's input layer, effectively performing preprocessing that are operations similar to the DSP block when using Edge Impulse's SDK on embedded systems. DeepStream reduces the development burden by implementing design choices and this convenience requires models to have a consistent input shape.
Gst-nvinfer requires the input tensor to be in the NCHW format where:
N: batch size – number of frames to pass to the inference Engine for batch inference
C: channels – number of input channels
H: height – input height dimension
W: width – input width
Edge Impulses models are not in this format by default, and some steps are required to prepare the model for use with Gst-nvinfer.
Gst-nvinfer makes use of TensorRT under the hood to do all the heavy lifting, with models needing to be converted to TensorRT engines. The simplest way to build a DeepStream TensorRT engine is to allow Gst-nvinfer to automatically build the engine which DeepStream does, if the model is in ONNX format. Preparing the model for DeepStream therefore requires the conversion of the model to ONNX format while also converting the input tensor to NCHW.
Note: There is a TensorRT deployment option in Edge Impulse, however these models don't work directly with DeepStream because the input layer is not in NCHW format. The TensorRT deployment is better suited for when a developer is manually building applications from the ground up, directly on top of TensorRT where there is complete control of the Inference Engine and application. This requires more coding than using DeepStream. It is also used with Edge Impulse EIM Linux deployments.
Note: Due to TensorRT being used at heart of Gst-nvinfer, it is possible to apply all the TensorRT development capabilities to override DeepStream if it is necessary. For example, manual Engine creation in C++ or Python as well as custom input and output layers through TensorRT's C++ plugin architecture could be reason's to override DeepStream.
Since the goal of DeepStream is to make the development process easier and more efficient, no-code approaches that simplify the process of working with DeepStream are provided.
The two primary no-code TensorRT engine creation approaches are:
Automatic Engine Creation: Gst-nvInfer builds the TensorRT Engine automatically, and produces a serialized Engine file for Inference.
Manual Engine Creation: Using the trtexec
command from the command line to manually produce serialized Engine files.
All of the steps involved in model conversion can also be applied to creating models in Edge Impulse that work with the Nvidia DeepStream Python samples available as Jupyter Notebooks, or also with Custom C++ implementations which you can build directly from the command line on your Jetpack environment on your Nvidia device.
Depending on the type of model you are building (Image Classification or Audio Classification), there are some specific considerations you need to take into account related to the features and the general process of converting and preparing the model to work with DeepStream. These same steps can be followed for using Edge Impulse as your MLOps tool for building TensorRT models, beyond just DeepStream.
Object detection is a category of computer vision that attempts to identify a specific item (or items) within an image or frame of video, and assigns a confidence value to the item. Items are generally shown with a bounding box around them, to indicate where in the image the object of interest is located.
Edge Impulse currently supports 3 object detection model architectures:
MobileNetV2 SSD FPN-Lite 320 x 320: From the TensorFlow Object Detection (TFOD) family of models
Faster Object More Objects (FOMO): Edge Impulse's custom object detection model designed run on constrained devices
YOLOv5: Standard YOLOv5 from Ultralytics
The TensorFlow Object Detection (TFOD) model contains some operations that are not natively supported by TensorRT. While it is possible to make TFOD work using some workarounds, the TFOD models are not easy to use out-of-the-box with DeepStream.
FOMO, which is designed to run directly on MCU targets, has an output layer not directly compatible with what DeepStream expects, and requires the additional step of implementing and managing a custom TensorRT output layer.
YOLOv5 is therefore the best option to use with DeepStream. The workflow is the simplest to implement, and Nvidia hardware is designed to support applications that utilize model architectures such as YOLO. YOLO performs well as an object detector due to it's ability to address depth variation and occlusion, maximizing the utilization of the hardware while also thoroughly analyzing the scene for objects or parts of objects.
Image models, including object detection, are machine learning models focused on visual data, as opposed to models focused on audio and sound, or sensor data coming from analog or digital measuring devices.
Image models built with Edge Impulse use raw pixels as input features. The input image is scaled down to reduce the model input layer size, in order to maintain processing throughput on lower-powered hardware. With DeepStream you are only limited by the power of the chosen Nvidia platform to run the model on. The resolution and input layer size can be made larger, and experimentation for each platform is useful to determine the best choice.
In addition to resolution, the model can be trained on RGB colour or Grayscale. Edge Impulse takes care of removing the alpha channel and allows you to select the input resolution and colour depth. Grayscale is ideal for tinyML applications due to the limited performance of most hardware, but on Nvidia hardware, color images can be utilized.
The input features for full color images are generated by Edge Impulse in RGB format (DeepStream supports RGB or BGR). At runtime, DeepStream's Inference plugin automatically transforms input frames to match the model input resolution and colour depth, which eliminates the need to write custom preprocessing code to do this step manually.
For object detection projects ultimately destined for use in DeepStream, the process will begin in Edge Impulse with data collection and then move into the model creation phase.
Edge Impulse provides a few versions of YOLOv5 that have been optimized for use with different accelerators. For Nvidia, you can select the standard YOLOv5 Community model.
Once training is complete, Edge Impulse allows you to export the YOLOv5 model directly into ONNX format, which also has the correct NCHW input layer. This saves the conversion to ONNX step, as its already done for the YOLOv5 model.
The Gst-nvinfer plugin outputs data in a standardized format that can be consumed by downstream plugins. This consists of gst-buffer that Gstreamer uses to pass data between plugins, as well as Nvidia's own metadata structures that provide information about the results of the model inference. In the case of object detection, a relevant structure is NvDsObjectMeta which contains the bounding box information in the form of the bounding box location, width and height amongst other relevant information for downstream plugins.
The output tensor format of YOLO varies depending on the specific implementation and model version, but I'll describe a common format used in YOLOv3 as an example.
In YOLOv3, the output tensor is a 3D array with dimensions [batch_size, grid_size, grid_size * num_anchors, num_classes + 5]. Since this is not compatible with DeepStream NvDsObectMeta structure, a custom output parser needs to be implemented. Fortunately Nvidia provides an SDK that allows you to create custom output parsers to support any kind of output format. Nvidia also provides an implementation for YOLOv2 and YOLOv3 and its variants, however Edge Impulse uses the later YOLOv5.
In order to use the custom output parser it needs to be built, which can be done from the command line on your Nvidia appliance. However, CUDA versions vary by Jetpack version, and this needs to be specified using the CUDA_VER
environment variable. The output is a .so file that then needs to be added to the Gst-nvinfer plugin configuration file using the custom-lib-path parameter. The custom bounding box parsing function also needs to be specified with the parse-bbox-func-name parameter. In this case the repo provides this function, called NvDsInferParseYolo. The next section will cover the process of configuring the Gst-nvinfer plugin.
The Gst-Nvinfer plugin is configured by means of a plain text file which specifies all the relevant parameters required to run a model with Deepstream and TensorRT behind the scenes. This file needs to be referenced from the Deepstream Application either as a Primary GPU Inference Engine (PGIE) where it is the first Inference to take place in the pipeline or as Secondary GPU Inference Engine (SGIE) where it’s the performing secondary inference on the output of a PGIE upstream.
Object Detection (in this case the YOLOv5 model built in Edge Impulse) is usually the first instance of Gst-nvinfer, i.e. the PGIE.
The minimal working version of a PGIE using the Edge Impulse YOLOv5 ONNX export is shown below:
All paths are relative to the configuration file which can be created in a text editor and placed where it can be referenced by your application, which could either be a custom Python or C++ application or a DeepStream Reference app and sample apps.
The batch size is set to 1
in the above example, and this matches the batch size of the model. In addition the custom output parser is also specified. The model color format is also set to match the format in your Impulse’s image preprocessing/feature block.
The provided repo contains a precompiled output parser ready to run on a Jetson Nano (Jetpack 4.6). The label file needs to be edited to replace the labels with your own label names. The label names can match the label names in your Impulse's final block in the same order. YOLO uses a label file format where each label is separated by a new line. For a single object type only one entry is sufficient.
Image classification is a more generalized use of computer vision, instead of trying to identify an object at a specific location within an image and drawing a bounding-box around it, instead image classification attempts to simply categorize a full image based on what the model has been trained to identify. A simple example could be thought of as, "this is a picture of an apple", or, "this is a picture of an airplane".
To create an Image Classification model, Select the Transfer Learning classification model, which is already pretrained on a larger dataset. This speeds up the training process, allowing you to quickly train your own model that will rapidly converge on your dataset.
Once the model has been trained and tested, it is ready for deployment and can be exported for use with DeepStream.
DeepStream supports ONNX model conversion, but the ONNX export options available such as the TensorRT Deployment are not suitable for DeepStream's specific requirements. Thus, a conversion process needs to be followed starting with the TFLite export option that is available in Edge Impulse.
A TFLite float32 model is required, and this can be obtained by exporting from the Download Output Block on your project's Dashboard:
This is the simplest way to get the model in TFLite format, but requires that you have direct access to the project in the Edge Impulse Studio.
Alternatively another approach is by using the C/C++ or Arduino Library Deployment options. This requires that the EON compiler is turned off, and that Unoptimized (float32) is selected prior to clicking Build.
This generates and downloads a .zip
file that contains your TFLite model, stored as an array in a C header file called tflite-trained.h
(located in the tflite-model
folder). When working with an Arduino Library, this folder is located under the src
directory.
The next step requires the TFlite model to be in the TFLite Flatbuffers format (.lite) and this requires converting the array data into a binary file. To convert to a normal Flatbuffer TFLite model, run the following code.
carray2bin.py is a python script that allows you to convert binary data payloads of c arrays to a bin file, and is provided in the /utils folder of the provided repo.
Note that this method is useful if you only have access to the C/C++ or Arduino Library export, if the export was built as Unoptimized (float32) as described above.
The float32 model generated by Edge Impulse has an input tensor named x_input
with the following layout with each dimension:
[N,H,W,C]
For example, with a 160x160 pixel input image, the input tensor is float32[1,160,160,1]
This requires a transpose to be added to the input tensor to accept input images as NCHW.
To convert the model from TensorFlow Lite to ONNX with the correct input shape for DeepStream requires the use of "tf2onnx":
Its important to note the --inputs-as-nchw serving_default_x:0
parameter that adds the transpose to the named input layer. The input layer name must be included for this to be correctly applied. Note that older Edge Impulse Classification exports may have the input tensor named as x_input
. If yours is named x_input
, the command will need to be modified to reflect inputs-as-nchw x_input
, otherwise the model input won’t be changed. The exact input layer name can be determined using Netron.
The result of the conversion should yield results similar to the following:
The second stage classifier instance of Gst-nvinfer (SGIE) typically receives the bounding box and optional tracking information from the upstream object detector. This is where the power of DeepStream enables the creation of applications that can perform fine-grained analysis on a scene by further classifying individual objects.
To run an Edge Impulse model from the ONNX file produced in the prior steps, the following configuration is required at a minimum for a SGIE.
To use as a SGIE to classify the output of a PGIE Object Detector set the process-mode
to 2
:
The label file needs to contain the list of text labels separated by semicolons. The labels should be in the same order as shown in the Edge Impulse Studio when viewing the Impulse configuration by looking in the output features.
Alternatively, the labels can be found in the Edge Impulse C++ SDK in the ei_classifier_inferencing_categories
array in the model_variables.h
header file.
This approach requires the ONNX model be in an accessible path on your system, and automatically builds the TensorRT as an Engine file after running. This is the simplest approach, only requiring the .onnx
file.
After the first run, the TensorRT is created as an .engine
file. To prevent rebuilding on subsequent runs, the ONNX file can be commented out and the .engine file can instead be directly referenced. This will prevent rebuilding the engine on each run, saving time.
The major limitation of automatic conversion of the model is that it works with implicit batching, where the batch size is 1, ignoring the batch dimension in the model. This may not be ideal when you need to perform inference on batches of images, to take advantage of the hardware batch inference capabilities.
The underlying TensorRT runtime does support Dynamic Batching though, which means that the batch size for inference can be determined at runtime and isn't fixed in the model.
In order to make use of Dynamic Batch sizes the model will have to be manually converted from ONNX to a TensorRT runtime using the Explicit Batch. The trtexec
command is part of TensorRT, and allows TensorRT Engines to be manually constructed.
The command is run as follows:
Here the following parameters are used at a minimum to convert your Edge Impulse Model:
--explicitBatch: used to specify that the model must be built for Dynamic Batching support with an explicit batch dimension in the input tensor. Required to use batch sizes larger than 1 with an Edge Impulse model.
--onnx=: specify the input ONNX file
--workspace=4000: specify the amount of memory allocated to the build process in MB, in this case 4GB. This gives trtexec
enough working temporary memory to create the model engine.
--saveEngine=: the output TensorRT runtime engine.
High speed object counting with computer vision and an Nvidia Jetson Nano Developer Kit.
Created By: Jallson Suryo
Public Project Link:
GitHub Repo:
The object counting systems in the manufacturing industry are essential to inventory management and supply chains. They mostly use proximity sensors or color sensors to detect objects for counting. Proximity sensors detect the presence or absence of an object based on its proximity to the sensor, while color sensors can distinguish objects based on their color or other visual characteristics. There are some limitations of these systems though; they typically have difficulty detecting small objects in large quantities, especially when they are not in a row or orderly manner. This can be compounded by a relatively fast conveyor belt. These conditions make the object counting inaccurate.
This project utilizes Edge Impulse's FOMO algorithm, which can quickly detect objects in every frame that a camera captures on a running conveyor belt. FOMO's ability to know the number and position of coordinates of an object is the basis of this system. The project aims to assess the Nvidia Jetson Nano's GPU capabilities in processing higher-resolution imagery (720x720 pixels), compared to typical FOMO object detection projects (which often target lower resolutions such as 96x96 pixels), all while maintaining optimal inference speed.
The machine learning model (named model.eim
) will be deployed using the TensorRT library, configured with GPU optimizations and integrated through the Linux C++ SDK. Additionally, the Edge Impulse model will be seamlessly integrated into our Python codebase to facilitate cumulative object counting. Our proprietary algorithm compares current frame coordinates with those of previous frames to identify new objects and avoid duplicate counting.
NVIDIA Jetson Nano Developer Kit
USB Camera (eg. Logitech C922)
Mini conveyer belt system with camera stand
Objects: eg. bolt
Ethernet cable
PC/Laptop to access Jetson Nano via SSH
Edge Impulse Studio
Edge Impulse Linux, Python & C++ SDK
NVIDIA Jetpack SDK
Terminal
In this project we use a Logitech C922 USB camera capable of 720p at 60 fps connected to a PC/laptop to capture the images for data collection, for ease of use. Take pictures from above the parts, at slightly different angles and lighting conditions to ensure that the model can work under different conditions (to prevent overfitting). Object size is a crucial aspect when using FOMO, to ensure the performance of this model. You must keep the camera distance from the objects consistent, because significant difference in object sizes will confuse the algorithm and cause difficulty in the auto-labelling process.
Open studio.edgeimpulse.com, login or create an account then create a new project.
Choose the Images project option, then Object detection. In Dashboard > Project Info, choose Bounding Boxes for the labeling method and NVIDIA Jetson Nano for the target device. Then in Data acquisition, click on Upload Data tab, choose your photo files, automatically split them between Training and Testing, then click on Begin upload.
Next,
For Developer accounts: click on the Labeling queue tab then drag a bounding box around an object and label it, then click Save. Repeat this until all images labelled. It goes quickly though, as the bounding boxes will attempt to follow an object from image to image.
For Enterprise accounts: click on Auto-Labeler in Data Acquisition. This auto-labeling segmentation / cluster process will save a lot of time over the manual process above. Set min/max object pixels and sim threshold (0.9 - 0.999) to adjust the sensitivity of cluster detection, then click Run. If something doesn't match or if there is additional data, labeling can be done manually as well.
Once you have the dataset ready, go to Create Impulse and set 720 x 720 as the image width and height. Then choose Fit shortest axis, and choose Image and Object Detection as Learning and Processing blocks.
In the Image block configuration, select Grayscale as the color depth and click Save parameters. Then click on Generate features to get a visual representation of the features extracted from each image in the dataset. Navigate to the Object Detection block setup, and leave the default selections as-is for the Neural Network, but perhaps bump up the number of training epochs to 120. Then we choose FOMO (MobileNet V2 0.35), and train the model by clicking the Start training button. You can see the progress on the right side of the page.
If everything is OK, then we can test the model, go to Model Testing on the left navigation and click Classify all. Our result is above 90%, so we can move on to the next step — Deployment.
Click on the Deployment navigation item, then search for TensorRT. Select Float32 and click Build. This will build an NVIDIA TensorRT library for running inferencing targeting the Jetson Nano's GPU. After it has downloaded, open the .zip
file and then we're ready for model deployment with the Edge Impulse C++ SDK directly on the NVIDIA Jetson Nano.
Then install Clang as a C++ compiler:
Clone from this repository and install these submodules:
Then install OpenCV and dependencies:
Build a specific model targeting NVIDIA Jetson Nano GPU with TensorRT using clang:
The result will be a file that is ready to run: /build/model.eim
If your Jetson Nano is running on a dedicated power supply (as opposed to a battery), its performance can be maximized by this command:
sudo /usr/bin/jetson_clocks
Now the model is ready to run in a high-level language such as the Python program in the next step. To ensure this model works, we can run the Edge Impulse Runner with the camera setup on the Jetson Nano and run the conveyor belt. You can the see the camera stream via your browser (the IP address is provided when Edge Impulse Runner first starts up). Run this command:
The inferencing time is around 15ms, which is an incredibly fast detection speed.
To compare these results, I have also deployed with the standard CPU-based deployment option (Linux AARCH64 model), and run with the same command above. The inferencing time is around 151ms with a Linux model that targets the CPU.
You can see the difference in inferencing time is about 10x faster when we target the GPU for the process. Impressive!
You can git clone the repo, or then run the program with the command pointing to the path where model.eim
is located:
Here is a demo video of the results:
The delay visible in the video stream display and its corresponding output calculation is caused by the OpenCV program rendering a 720x720 display resolution window, not by the inference time of the object detection model. This demo test uses 30 bolts per cycle attached to the conveyor belt to show a comparison with the output on the counter.
We have successfully implemented object detection on a high-speed conveyor belt, with high-resolution video captured, and run a cumulative counting program locally on an Nvidia Jetson Nano. With the speed and accuracy obtained, we are confident in the scalability of this project to various scenarios, including high-speed conveyor belts, multiple object classes, and sorting systems.
Create synthetic data to rapidly build object detection datasets with Nvidia Omniverse's Replicator API and Edge Impulse.
Created By:
Public Project Link:
GitHub Repo:
In the realm of machine learning, the availability of diverse and representative data is crucial for training models that can generalize well to real-world scenarios. However, obtaining such data can often be a complex and expensive endeavor, especially when dealing with complex environments or limited data availability. This is where synthetic data generation techniques, coupled with domain randomization, come into play, offering innovative solutions to overcome these obstacles.
Synthetic data refers to artificially generated data that emulates the statistical properties and patterns of real-world data. It is created through sophisticated algorithms and models that simulate the characteristics of the original data while maintaining control over its properties. Domain randomization, on the other hand, is a technique used in conjunction with synthetic data generation, where various parameters and attributes of the data are intentionally randomized within specified bounds. This randomized variation helps the model become more robust and adaptable to different environments.
NVIDIA Omniverse™ represents a groundbreaking platform that is set to revolutionize the collaborative, design, and simulation processes within industries. This cutting-edge tool combines real-time rendering, physics simulation, and advanced AI capabilities to create a highly powerful and scalable solution.
The Edge Impulse platform, along with its integrated Edge Impulse Studio, is a comprehensive solution tailored for developing and deploying embedded machine learning models. It empowers developers to seamlessly gather, process, and analyze sensor data from various edge devices, such as microcontrollers and sensors. With Edge Impulse Studio, users can easily create and train machine learning models using a visual interface or code-based workflow.
In this project we will use the Omniverse™ Replicator API inside of Omniverse™ Code to generate our synthetic dataset of fruits (apples, oranges, and limes). Once our dataset has been created we will import the dataset into Edge Impulse Studio, create and train an object detection model, and then deploy it an NVIDIA Jetson Nano.
You can think of Code as an IDE for building advanced 3D design and simulation tools. Head over to the Extensions
tab and search for Code
, then click on Code and install it.
Within Omniverse™ Code there is a feature called Script Editor
. This editor allows us to load Python code into the IDE and execute it. This makes it very easy for us to set up our scenes and manipulate our assets.
For simplicity, in this tutorial we will use assets that are readily available in Omniverse™ Code. Within the IDE you will find a tab called NVIDIA Assets
, opening this tab will provide you with a selection of ready to use assets. The assets are of type USD
which stands for Universal Scene Description
.
For this tutorial, code has been provided that will work out of the box in the script editor, all you will have to do is modify the basepath
variable and alternate the different datasets.
The first step is to clone the repository to a location on your machine.
You will find the provided code in the project root in the omniverse.py
file.
Let's take a quick look at some of the key features of the code.
At the top of the code you will find the settings for the program. You don't have to use the same assets that I have used, but if you would like to quickly get set up it is easier to do so.
You should set the basepath
variable to the path to the project root on your machine. If you are using Linux you will need to modify any path in the code as the paths have backslashes for directory separators. For the dataset
variable you can use the following to generate your dataset:
All Will generate a dataset that includes images of all the fruit types on the table.
Apple Will generate a dataset that includes images of apples on the table.
Orange Will generate a dataset that includes images of oranges on the table.
Lime Will generate a dataset that includes images of limes on the table.
Together, these images will make up our entire dataset.
The first function we come to in the code will create the table. Here we create the table from the USD file in the settings, ensure that items do not fall through it by using rep.physics.collider()
, adds mass to the object with rep.physics.mass(mass=100)
, and then modifies the pose which includes position
and rotation
. Finally we register the randomizer.
Next, the code will take care of the lighting.
The next function will take care of the fruits. Here you will notice we use a uniform distribution for the position
, rotation
and scale
. This means that each number in the ranges has an equal chance of being chosen. Here we also define a class for the data.
Next we set up the camera and set the value for focus distance
, focal length
, position
, rotation
, and f-stop
.
The next code will create the writer which writes our images to the specified location on our machine. Here we set the output_dir
, rgb
, and bounding box
values.
Finally we set the randomizers to be triggered every frame, and then run the randomizers.
Now we have explored the code and updated our settings, it is time to run the code and generate our dataset. Ensuring Omniverse™ Code is opened, copy the contents of omniverse.py
and paste it into the script editor. Once you have done this press the Run
button, or ctrl + enter
.
Remember to change the dataset
variable to the relevant class and run the script for each of the 3 classes.
Head over to the data/rendered
directory and you will find all of your generated data. Navigate through the various folders to view the created datasets.
Next we will visualize our dataset, including the bounding boxes that were generated by the writer. In Visual Studio Code, open the project root and open the visualize.py
file. Once it is opened, open the terminal by clicking view
-> Terminal
.
Next, install the required software. In the terminal, enter the following commands:
For each image you would like to visualize you will need to update the code with the path and number related to the image. At the bottom of visualize.py
you will see the following code:
The writer will save images with an incrementing number in the file name, such as rgb_0000.png
, rgb_0001.png
etc. To visualize your data simply increment the file_number
variable.
You can now run the following code, ensuring you are in the project root directory.
You should see similar to the following:
Log in or create an account on Edge Impulse and then create a new project. Once created scroll down on the project home page to the Project Info
area and make sure to change Labeling method
to Bounding Boxes (Object Detection)
and Target Device
to Jetson Nano
. Now scroll down to the Performance Settings
and ensure that Use GPU for training
and Enterprise performance
are selected if you have those options.
Running the Edge Impulse NVIDIA Jetson Nano setup script
Connecting your device to the Edge Impulse platform
Once the firmware has been installed enter the following command:
If you are already connected to an Edge Impulse project, use the following command:
Follow the instructions to log in to your Edge Impulse account.
Once complete head over to the devices tab of your project and you should see the connected device.
Unfortunately Omniverse does not generate bounding boxes in the format that Edge Impulse requires, so for this project we will upload the data and then label it in Edge Impulse Studio.
We will start with the Apple
class. Head over to the Data Aquisition
page, select your 50 apple images, and click upload.
Next head over to the Labelling Queue
page. Here you can draw boxes around your data and add labels to each fruit in each image, then repeat these steps for each of the classes.
Note that the EI platform will attempt to track objects across frames, in some cases it makes duplicates or adds incorrect bounding boxes, ensure that you delete/modify these incorrect bounding boxes to avoid problems further down the line.
Once you have completed the apples
data, repeat the steps for the oranges
and limes
images.
Once you have finished labelling the data you should have 150 images that each have around 15 pieces of fruit labelled, and a data split of 80/20.
Now it is time to create our Impulse. Head over to the Impulse Design
tab and click on the Create Impulse
tab. Here you should set the Image Width
and Image Height
to 512
. Next add an Image
block in the Processing Blocks
section, then select Yolov5
in the Learning Blocks
section, and finally click Save Impulse
.
Next click on the Images
tab and click on Save Parameters
, you will redirected to the features page. Once on the features page click Generate Features
. You should see that your features are nicely grouped, this is what we are looking for to achieve satisfactory results.
Now it is time to train our model, head over to the Yolov5
tab, leave all the settings as they are aside from training cycles which I set to 750, then click Start Training
. This while take a while so grab a coffee.
Once training is finished we see we achieved an exceptional F1 Score of 97.2%.
Now it is time to test our model. There are a few ways we can test through Edge Impulse Studio before carrying out the ultimate test, on-device testing.
Platform testing went very well, and our model achieved 99.24% on the Test (unseen) data.
To carry out live testing through the Edge Impulse Studio, connect to your Jetson Nano and enter the following command:
Once your device is connected to the platform you can then access the camera and do some real-time testing via the platform.
In my case live testing through Edge Impulse Studio also did very well, classifying each fruit correctly.
The final test is the on-device testing. For this we need to download the model and build it on our Jetson Nano. Luckily, Edge Impulse makes this a very easy task. If you are still connected to the platform disconnect, and then enter the following command:
This will download the model, build and then start classifying, ready for you to introduce some fruit.
In my case the model performed extremely well, easily classifying each fruit correctly.
In this project, we utilized NVIDIA's state-of-the-art technology to generate a fully synthetic fruit dataset. The dataset was imported into Edge Impulse Studio, where we developed a highly accurate object detection model. Finally, we deployed the model to our NVIDIA Jetson Nano.
The outcomes clearly demonstrate the effectiveness of NVIDIA's Replicator as a robust tool for domain randomization and the creation of synthetic datasets. This approach significantly accelerates the data collection process and facilitates the development of synthetic datasets that generalize well to real-world data.
By combining Replicator with Edge Impulse Studio, we have harnessed a cutting-edge solution that empowers us to rapidly and efficiently build reliable object detection solutions. This powerful combination holds immense potential for addressing various challenges across different domains.
Once again, a big thank you to NVIDIA for their support in this project. It has been an amazing experience learning about how to use Omniverse in an Edge Impulse pipeline, keep an eye out for future projects.
A complete end-to-end sample project and guide to get started with Nvidia TAO for the Renesas RA8D1 MCU.
Created By: Peter Ing
Public Project Link:
The Renesas RA8 series is the first product to implement the Arm Cortex-M85, a high-performance MCU core tailored for advanced AI and machine learning at the edge. Featuring Arm Helium technology and enhanced ML instructions, it delivers up to 4x the ML performance of earlier M-series cores. With high clock speeds, energy efficiency, and TrustZone security, it's ideal for tasks like speech recognition, anomaly detection, and image classification on embedded devices.
Edge Impulse includes support for Nvidia TAO transfer learning and deployment of Nvidia Model Zoo models to the Renesas RA8D1.
This project provides a walkthrough of how to use the Renesas EK-RA8D1 Development kit with Edge Impulse using an Nvidia TAO-enabled backend to train Nvidia Model Zoo models for deployment onto the EK-RA8D1. By integrating the EK-RA8D1 with Edge Impulse's Nvidia TAO training pipeline, you can explore advanced machine learning applications and leverage the latest features in model experimentation and deployment.
Renesas EK-RA8D1 -
Edge Impulse
Edge Impulse CLI JLink Flashing Tools Edge Implulse Firmware for EK-RA8D1
Renesas supports developers building on the RA8 with various kits, including the EK-RA8D1, a comprehensive evaluation board that simplifies prototyping.
As part of the Renesas Advanced (RA) series of MCU evaluation kits, the EK-RA8D1 features the RA8 Cortex-M85 MCU which is the latest high-end MCU from Arm, superseding the Cortex M7. The Cortex M85 is a high-performance MCU core designed for advanced embedded and edge AI applications. It offers up to 4x the ML performance of earlier Cortex-M cores, powered by Arm Helium technology for accelerated DSP and ML tasks.
The Renesas EK-RA8D1 evaluation kit is a versatile platform designed for embedded and AI application development. It features USB Full-Speed host and device support with 5V input via USB or external power supply, along with onboard debugging through Segger J-Link® and support for ETM, SWD, and JTAG interfaces. Developers can utilize 3 user LEDs, 2 buttons, and multiple connectivity options, including Seeed Grove® (I2C & analog), Digilent Pmod™ (SPI & UART), Arduino™ Uno R3 headers, MikroElektronika™ mikroBUS, and SparkFun® Qwiic® (I2C). An MCU boot configuration jumper further enhances flexibility, making the EK-RA8D1 ideal for rapid prototyping and testing.
The kit also features a camera and full color LCD display, making it ideal for the development and deployment of edge AI solutions allowing on-device inference results to be rendered to the onboard LCD.
There two ways to connect the board, either using the Edge Impulse CLI or directly from within the Studio UI. To access via the CLI run the command edge-impulse-daemon
and provide login credentials, then select the appropriate Studio project to connect your board.
Alternatively, clicking the Data acquisition menu item in the left navigation bar presents the data collection page. Select 320x240 to get the maximum resolution out of the camera on the EK-RA8D1 when capturing samples.
Edge Impulse will ask you if the project is object detection project. Select 'No' to configure the project as an Image classification project when using image data.
Alternatively, go the Dashboard page by clicking Dashboard on the left navigation and select One label per data item from the Labeling method dropdown.
Capture sample images by presenting objects to the camera that you wish to identify, and click the Start sampling button to capture a full color image from the board.
Different types or classes of object can be captured, and these can be added by changing the label string in the Label text box. For example, a class called needle_sealed
is created by setting the label to this name and then capturing pictures of sealed needles.
Once all images are annotated you should balance your data so that you split your dataset between a Training and Test set. This is done by selecting Dashboard from the navigation menu on the left and then scrolling down to find and click the Perform train / test split button. Edge Impulse will try to get as close to an 80/20 split as possible depending on the size of your dataset.
The data split can be seen at the top of the Data acquisition page where you can not only see the split of data items collected by label as a pie chart, but also the resulting split under the TRAIN / TEST SPLIT element.
The next step is to create a new Impulse which is accessed from the Create Impulse menu. Select the Renesas RA8D1 (Cortex M85 480Mhz) as the target, doing so automatically targets the EK-RA8D1 which is the RA8D1 based board supported by Edge Impulse.
Set the image width and height to 224px x 224px to match the pretrained backbone dimensions in Nvidia TAO Model Zoo:
Classification requires an Image processing block; this is added by clicking Add a processing block and then selecting Image from the options presented.
Once the Image processing block is added the Transfer Learning Block needs be added by selecting Add a learning block and then choosing the first option, Transfer Learning (Images). Nvidia TAO is based on transfer learning so selecting this block is the first step towards activating the Nvidia TAO classification pipeline in the backend.
The resulting Impulse should look as follows before proceeding.
The next step is to generate the raw features that will be used to train the model. First click Save Impulse then select the Image submenu from the Impulse Design menu in the left hand navigation to access the settings of the Image processing block.
In the Parameters tab, leave the color depth as RGB as the TAO Models use 3 channel RGB models:
Under the Generate features tab simply click the Generate features button to create the scaled down 224x224 images that will be used by TAO to train and validate the model.
The process will take a few seconds to minutes depending on the dataset size. Once done the results of the job are shown and the reduced images are stored in the backend as features to be passed to the model during training and validation.
Once the image features are done, a green dot appears next to Images in the Impulse design navigation. The Transfer Learning submenu is then activated, and can be accessed by clicking Transfer learning in the navigation pane under Impulse design, this takes you to the configuration area of the learning block.
To activate Nvidia TAO in the project the default MobileNetV2 model architecture needs to be deleted, by clicking the Delete model (trash can) icon on the lower right corner of the model.
Once this is done you will see there is no model architecture activated for the project, and a button titled "Choose a different model" will be shown in place of the deleted MobileNet model.
Clicking the "Choose a different model" button will present a list of model architectures available in Edge Impulse. Since the project is configured as Classification, only classification model architectures are available. To access the Nvidia TAO Classification Model scroll down to the bottom.
The Nvidia TAO models are only available under Professional and Enterprise subscriptions as shown by the labels. For this project we are going to use Nvidia TAO Image Classification. Selecting any of the Nvidia TAO models like this activates the Nvidia TAO training environment automatically behind the scenes in the project.
Once the Nvidia TAO Classification model is selected all the relevant hyperparameters are exposed by the GUI. The default training settings are under the Training settings menu and the Advanced training settings menu can be expanded to show the full set of parameters specific to TAO.
All of the relevant settings available in TAO including Data augmentation and backbone selection are available from the GUI. The data augmentation features of TAO can be accessed by expanding the Augmentation settings menu. Backbone selection is accessed from the Backbone dropdown menu and for this project we will be using the MobileNet v2 (800K params) backbone.
It's also essential to select GPU for training as TAO only trains on GPU's. Also set the number of training cycles (epochs) to a higher number than the default. Here we start with 300.
All that's left to do is click the Save and train button to commence training. This can take from 1 to several hours depending upon the dataset size and other factors such as backbone, etc.
Once training is completed, the results are shown:
The accuracy and confusion matrix, latency and memory usage are shown for both Unoptmized (float32) and Quantized (int8) models, which can be used with the EK-RA8AD1. Take note of the PEAK RAM USAGE and FLASH USAGE statistics at the bottom. These indicate if the model will fit within RAM and ROM on the target.
Before deploying the model to the development kit, the model can first be tested by accessing the Live classification menu on the left navigation. Clicking the Classify all button runs the Test dataset through the model, and shows the results on the right:
The results are visible in the right side of the window, and can give a good indication of the model performance against the captured dataset.
The Model testing page also allows you to perform realtime classification using uploaded files, by selecting a file from the Classify existing test sample dropdown menu and clicking the Load sample button.
The results shown when doing this are from the classification being performed in Edge Impulse, not on the device.
If you wish to test the camera on the EK-RA8D1 but still run the model in Edge Impulse, you can connect the camera using the edge-impulse-daemon
CLI command to connect the camera just as you would when you perform data acquisition.
You can iteratively improve the model by capturing more data and choosing the Retrain model sub menu item which takes you to the retrain page where you can simply click the Train model button to retrain the model with the existing hyperparameters.
To test the model directly on the EK-RA8D1, go to the Deployment page by clicking the Deployment sub menu item in the left navigation. In the search box type Renesas.
The drop down menu will filter out all the other supported boards and give you two options for the EK-RA8D1. The RA8D1 MCU itself has 2Mb of FLASH for code storage and 1Mb of RAM integrated. The EK-RA8D1 development kit adds 64Mb of external SDRAM and 64Mb of external QSPI FLASH to support bigger models.
The Quantized (int8) model should be selected by default and the RAM and ROM usage is shown, which is what you would have seen in the training page when training completed.
Renesas EK-RA8D1 target – This builds a binary for when RAM and ROM usage fit within the RA8D1 MCU's integrated RAM and FLASH memory.
Renesas EK-RA8D1 SDRAM target – This builds a binary that loads the model into the external SDRAM when the model is over 1Mb. (Note there is a slight performance penalty as the external RAM has to be accessed over a memory bus and is also SDRAM vs the internal SRAM)
When you click the Build button Edge Impulse builds the project and generates a .zip
archive containing the prebuilt binary and supporting files, which downloads automatically when completed.
This archive contains the same files as the Edge Impulse firmware you would have downloaded when following this guide at the begging of the project when you were connecting your board for the first time. The only difference is that the firmware (.hex) now contains your model vs the default model.
To flash the new firmware to your board, replace the contents of the folder where you have the firmware with the contents of the downloaded archive.
Note, you need to make sure you have connected the USB cable to the JLink port (J10).
Run the appropriate command to flash the firmware to the board.
To test the performance of the image classification on the board and see inference latency and DSP processing time, connect the USB cable to J11.
Then run the edge-impulse-run-impulse
CLI command:
The inference execution time and results are then shown in the CLI.
In this guide we have covered the step by step process of using Edge Impulse's seamless integration of Nvidia's TAO transfer learning image classification model from Nvidia’s model zoo, and how to deploy the model to the Renesas EK-RA8D1 Arm Cortex-M85 MCU development kit. In this way we have shown how Edge Impulse makes it possible to use Nvidia image classification models on an Arm Cortex-M85 MCU.
An Nvidia Jetson Nano computer vision project with TensorRT acceleration, to perform counting for manufacturing quality control.
Created By: Jallson Suryo
Public Project Link:
GitHub Repo:
The Quality Control process, especially those involving visual-based calculations and carried out repeatedly, is time consuming and prone to errors when performed by humans. Using sensors to count or calculate presence also does not provide a solution if the object you want to detect is an object made up of multiple components, each needing measurement. Food products, finished goods, or electronic manufacturing processes could be examples of this type of scenario.
A computer vision system for quality/quantity inspection of product manufacturing on a conveyor belt. The setting of this project will be in a hypothetical mass-production pizza factory where a Jetson Nano with a camera will detect and count the number of toppings for each pizza that passes by on a conveyor belt to ensure the quantity of toppings (pepperoni, mushroom, and paprika) meets a predefined quality standard. Speed, reliability, and cost efficiency are the goals for this project.
This project uses Edge Impulse's FOMO (Faster Objects, More Objects) algorithm, which can quickly detect objects and use them as a quality/quantity check for products on a running conveyor belt. FOMO's ability to know the number and position of coordinates of an object is the basis of this system. This project will explore the capability of the Nvidia Jetson Nano's GPU to handle color video (RGB) with a higher resolution (320x320) than some other TinyML projects, while still maintaining a high inference speed. The machine learning model (model.eim
) will be deployed with the TensorRT library, which will be compiled with optimizations for the GPU and will be setup via the Linux C++ SDK. Once the model can identify different pizza toppings, an additional Python program will be added, to check each pizza for a standard quantity of pepperoni, mushrooms, and paprikas. This project is a proof-of-concept that can be widely applied in the product manufacturing and food production industries to perform quality checks based on a quantity requirement of part in a product.
Nvidia Jetson Nano with dedicated power adapter
USB camera/webcam (eg. Logitech C270)
Mini conveyer belt system (10cm x 50cm or larger)
Camera stand/holder
Objects: For this example, mini pizza with toppings (dough or printed paper)
Ethernet cable
PC/Laptop to access Jetson Nano via SSH
Edge Impulse Studio
Edge Impulse Linux, Python, C++ SDK
Ubuntu OS/Nvidia Jetpack
Terminal
In this project we can use a camera (webcam) connected to a PC/laptop to capture the images for data collection for ease of use. Take pictures of your pizza components from above, with slightly different angles and lighting conditions to ensure that the model can work under different conditions (to prevent overfitting). While using FOMO, object size is a crucial aspect to ensure the performance of this model. You must keep the camera distance from objects consistent, because significant differences in object size will confuse the FOMO algorithm.
Choose the Images project option, then Classify Multiple Objects. In Dashboard > Project Info, choose Bounding Boxes for the labeling method and Nvidia Jetson Nano for the target device. Then in Data acquisition, click on the Upload Data tab, choose your photo files that you captured from your webcam or phone, choose Auto split, then click Begin upload.
For Enterprise (paid) accounts, you will instead click on Auto-Labeler in Data Acquisition. This auto-labeling segmentation / cluster process will save a lot of time over the manual process above. Set the min/max object pixels and sim threshold (0.9 - 0.999) to adjust the sensitivity of cluster detection, then click Run. Next, you can label each cluster result as desired. If something doesn’t match or if there is additional data, labeling can be done manually as well.
Once you have the dataset ready, go to Create Impulse and in the Image block, set 320 x 320 as the image width and height. Then choose Fit shortest axis, and choose Image and Object Detection as Learning and Processing blocks.
In the Image parameters section, set the color depth as RGB then press Save parameters. Then click on Generate and navigate to the Object Detection block setup using the left navigation. Leave the training setting for Neural Network as it is or check our settings — in our case everything is quite balanced, so we'll leave them alone and choose FOMO (MobileNet V2 0.35). Train the model by pressing the Start training button. You can see the progress of the training in the log to the right.
If everything is OK, the training will job will finish in a short while, then we can test the model. Go to the Model Testing section and click Classify all. Our result is above 90%, so we can move on to the next step — Deployment.
Click on the Deployment tab then search for TensorRT, then select Float32 and click Build. This will build an Nvidia TensorRT library for running inference, targeting the Jetson Nano's GPU. Once the build finishes and the file is downloaded, open the .zip
file, then we're ready for model deployment with the Edge Impulse C++ SDK on the Jetson Nano side.
On the Jetson Nano, there are several things that need to be done. Flash the Nvidia-provided Ubuntu OS with JetPack, which can be downloaded from the Nvidia Jetson website, to an SD Card. Insert the SD Card and power on the board, go through the setup process to finish the OS configuration, and connect the board to your local network. Then ssh
from your PC/laptop and install the Edge Impulse tooling via the terminal:
Then install Clang
as a C++ compiler:
Clone this GitHub repository and install these submodules:
Then install OpenCV:
sh build-opencv-linux.sh
Now make sure the contents of the TensorRT folder you downloaded from the Edge Impulse Studio have been unzipped and moved to the example-standalone-inferencing-linux
directory. For FOMO, we need to edit the variables in the source/eim.cpp
file with:
To build a specific model targeting the Jetson Nano GPU with TensorRT, using Clang:
The resulting model will be ./build/model.eim
If your Jetson Nano is run with a dedicated power supply, its performance can be maximized by this command:
Now the model is ready to run in a high-level language such as the Python program we'll use in the next step. To ensure this model works, we can run the Edge Impulse Runner with the camera setup on the Jetson Nano and turn on the conveyor belt. You can see what the camera observes via your browser; the local IP address and port will be shown when the Linux Runner is started. Run this command:
The inferencing time is about 5ms, which is an incredibly fast detection speed.
To compare, I have also used the Linux Runner with the CPU version of the model, downloaded via edge-impulse-linux-runner --download modelfile.eim
then running it with the same command as above.
You can see the difference in inferencing time, which is almost 6-times faster when we compile and run on the GPU. Impressive!
With the impressive performance of live inferencing shown by the Linux Runner, now we will create a Python program to be able to calculate the number of toppings on a pizza compared to a desired amount, and that will provide an OK or Bad output if the number of toppings is incorrect.
The program I made (topping.py
) is a modification of Edge Impulse's classify.py
in the examples/image
folder from the linux-python-sdk
directory.
My program will change the moving object detection input from the model file (model.eim
), for example: 0 0 2 3 3 1 0 1 3 3 3 2 0 0 0 2 3 3 2 0 0 2 5 5 1 0 0 2 3 3 1 0 0 1 2 2 0 0
will record 0 as the sequence separator and record the peak value in each sequence. As an example, if the correct number of toppings on a pizza (per quality control standards) is 3, and we know that a 0 is a seperator, and anything other than 3 is bad...then 0 3 0 3 0 3 0 5 0 3 0 2 0
is: OK OK OK BAD OK BAD
To run the program, use the command along with the path where model.eim
file is located. Be sure to use the one built for the GPU, in case you have both on the Nano still:
To see the process in action, have a look at the demo video availaCheck our demo video:
We have successfully implemented an object detection computer vision model targeting Nvidia Jetson Nano's GPU, in a food/product manufacturing setting. The FOMO object detection with RGB color and 320x320 resolution is handled by the Jetson Nano's GPU accurately, with an inference time of only 5ms. This would allow it to be applied with higher resolution for more complex objects, faster conveyor belts, and higher speed cameras (>100fps). Embedding TensorRT models with high-level languages such as Python make it easy to apply to specific use-cases and provides the capability to control display, lights, alarms or servos for automation and manufacturing systems, as well.
Tracking Rooftop ice buildup detection using Edge Impulse and The Things Network, with synthetic data created in NVIDIA Omniverse Replicator and Sun Studies.
Created By: Eivind Holt
Public Project Link:
GitHub Repo:
NVIDIA GeForce RTX
Icicle formation is detected using a neural network (NN) designed to identify objects in images from the onboard camera. The NN is trained and tested exclusively on synthesized images. The images are generated with realistic simulated lighting conditions. A small amount of real images are used to later verify the model.
The main challenge of detecting forming icicles is the translucent nature of ice and natural variation of sunlight. Because of this we need a great number of images to train a model that captures enough features of the ice with varying lighting conditions. Capturing and annotating such a large dataset is incredibly labor intensive. We can mitigate this problem by synthesizing images with varying lighting conditions in a realistic manner and have the objects of interest automatically labeled.
A powerful platform combined with a high resolution camera with fish-eye lens would increase the ability to detect icicles. However, by deploying the object detection model to a small, power-efficient, but highly constrained device, options for device installation increase. Properly protected against moisture this device can be mounted outdoors on walls or poles facing the roofs in question. LoRaWAN communication enables low battery consumption and long transmission range.
One of the most labor intensive aspects of building any machine learning model is gathering the training data and labeling it. For an object detection model this requires taking hundreds or thousands of images of the objects to detect, drawing rectangles around them, and choosing the correct label for each class. Recently generating pre-labeled images has become feasible and has proven to have great results. This is referred to as synthetic data generation with domain randomization. In this project a model will be trained exclusively on synthetic data, and we will see how it can detect the real life counterparts.
Blender change origin cheat sheet:
Select vertex on model (Edit Mode), Shift+S-> Cursor to selected
(Object Mode) Select Hierarchy, Object>Set Origin\Origin to 3D Cursor
(Object Mode) Shift+S\Cursor to World Origin
Tip for export:
Selection only
Convert Orientation:
Forward Axis: X
Up Axis: Y
To be able to produce images for training and include labels, we can use a feature of Replicator toolbox found under menu Replicator > Semantics Schema Editor.
Here we can select each top node representing an item for object detection and add a key-value pair. Choosing "class" as Semantic Type and "ice" as Semantic Data enables us to export this string as a label later.
To keep the items generated in our script separate from the manually created content, we start by creating a new layer in the 3D stage:
Next we specify that we want to use ray-tracing as our image output. We create a camera and hard code the position. We will point it to our icicles for each render later. Then we use our previously defined semantics data to get references to the icicles for easier manipulation. We also define references to a plane on which we want to scatter the icicles. Lastly we define our render output by selecting the camera and setting the desired resolution. Due to an issue in Omniverse where artifacts are produces at certain resolutions, e.g. 120x120 pixels, we set the output resolution at 128x128 pixels. Edge Impulse Studio will take care of scaling the images to the desired size should we use images of different size than the configured model size.
Due to the asynchronous nature of Replicator we need to define our randomization logic as call-back methods by first registering them in the following fashion:
Before defining the logic of the randomization methods we define what will happen during each render:
The parameter num_frames specifies the desired number of renders. The rt_subframes parameter allows the rendering process to advance a set number of frames before the result is captured and saved to disk. A higher setting enhances complex ray tracing effects like reflections and translucency by giving them more time to interact across surfaces, though it increases rendering time. Each randomization routine is invoked with the option to include specific parameters.
To save each image and its corresponding semantic data, we utilize a designated API. While customizing the writer was considered, attempts to do so using Replicator version 1.9.8 on Windows led to errors. Therefore, we are employing the "BasicWriter" and will develop an independent script to generate a label format that is compatible with Edge Impulse.
rgb indicates that we want to save images to disk as .png
files. Note that labels are created setting bounding_box_2d_loose. This is used in this case instead of bounding_box_2d_tight as the latter in some cases would not include the tip of the icicles in the resulting bounding box. It also creates labels from the previously defined semantics. The code ends with running a single iteration of the process in Omniverse Code, so we can preview the results.
The bounding boxes can be visualized by clicking the sensor widget, checking "BoundingBox2DLoose" and finally "Show Window".
Now we can implement the randomization logic. First we'll use a method that flips and scatters the icicles on a defined plane.
Next a method that randomly places the camera on another defined plane, and makes sure the camera is pointing at the group of icicles and randomizes focus.
We can define the methods in any order we like, but in rep.trigger.on_frame it is crucial that the icicles are placed before pointing the camera.
The surface behind the icicles may vary greatly, both in color and texture. Using Replicator randomizing the color of an object's material is easy.
In the scene in Omniverse, either manually create a plane behind the icicles, or create one programmatically.
In Code, define a function that takes in a reference to the plane we want to randomize, the color of the distribution functions with min and max value span:
Then get a reference to the plane:
Lastly register the function and trigger it on each new frame:
Now each image will have a background with random (deterministic, same starting seed) RGB color. Replicator takes care of creating a material with a shader for us. As you might remember, in an effort to reduce RAM usage our neural network reduces RGB color channels to grayscale. In this project we could simplify the color randomization to only pick grayscale colors. The example has been included as it would benefit in projects where color information is not reduced. To only randomize in grayscale, we could change the code in the randomization function to use the same value for R, G and B as follows:
To further steer training of the object detection model in capturing features of the desired class, the icicles, and not features that appear due to short commings in the domain randomization, we can create images with the icicles in front of a large variety of background images. A simple way of achieving this is to use a large dataset of random images and randomly assigning one of them to a background plane for each image generated.
We could instead generate textures with random shapes and colors. Either way, the resulting renders will look weird, but help the model training process weight features that are relevant for the icicles, not the background.
or debug from Visual Studio Code by setting input folder in launch.json
like this:
This will create a file bounding_boxes.labels
that contains all labels and bounding boxes per image.
to connect to your account and project, and upload the image files and labels in bounding_boxes.labels
. To switch project if necessary, first run:
At any time we can find "Perform train/test split" under "Danger zone" in project dashboard, to distribute images between training/testing in a 80/20 split.
Since our synthetic training images are based on both individual and two different sized clusters of icicles, we can't trust the model performance numbers too much. Greater F1 scores are better, but we will never achieve 100%. Still, we can upload increasing numbers of labeled images and observe how performance numbers increase.
2,000 images:
6,000 images:
14,000 images:
26,000 images:
If we look at results from model testing in Edge Impulse Studio, at first glance the numbers are less than impressive.
However if we investigate individual samples where F1 score is less than 100%, we see that the model indeed has detected the icicles, but clustered differently than how the image was originally labeled. What we should look out for are samples that contain visible icicles where none were detected.
In the end virtual and real-life testing tells us how well the model really performs.
Install the Sun Study extension in Isaac Sim to be able to vary light conditions while testing.
Paste your API key found in the Edge Impulse Studio > Dashboard > Keys > Add new API key into Omniverse Extension:
To be able to classify any virtual camera capture we first need to build a version of the model that can run in a JavaScript environment. In Edge Impulse Studio, go to Deployment, find "WebAssembly" in the search box and click Build. We don't need to keep the resulting .zip package, the extension will find and download it by itself in a moment.
Back in the Edge Impulse extension in Isaac Sim, when we expand the "Classification" group, a message will tell us everything is ready: "Your model is ready! You can now run inference on the current scene".
Before we test it we will make some accommodations in the viewport.
Switch to "RTX - Interactive" to make sure the scene is rendered realistically.
Set viewport resolution to square 1:1 with either the same resolution as our intended device inference (120x120 pixels), or (512x512 pixels).
Display Isaac bounding boxes by selecting "BoundingBox2DLoose" under the icon that resembles a robotic sensor, then click "Show Window". Now we can compare the ground truth with model prediction.
To get visual verification our model works as intended we can go to Deployment in Edge Impulse Studio, select OpenMV Firmware as target and build.
Start by selecting Arduino library as a Deployment target.
Once built and downloaded, open Arduino IDE, go to Sketch > Include Library > Add .zip Library ... and locate the downloaded library. Next go to File > Examples > [name of project]_inferencing > portenta_h7 > portenta_h7_camera to open a generic sketch template using our model. To test the model continuously and print the results to console this sketch is ready to go. The code might appear daunting, but we really only need to focus on the loop()
function.
In short, we perform inference every 10 seconds. If any icicles are detected we simply transmit a binary 1
to the The Things Stack application. It is probably obvious that the binary payload is redundant, the presence of a message is enough, but this could be extended to transmit other data, for example the prediction confidence, number of clusters, battery level, temperature or light level.
Now we can observe messages being received and decoded in Live data in the TTS console.
Observe the difference in the real uplink (first) and simulated uplink (second). In both we find "decoded_payload":{"detected":true}.
The project has no safe-guard against false negatives. The device will not report if it's view is blocked. This could be resolved by placing static markers on both sides of an area to monitor and included in synthetic training data. Absence of at least one marker could trigger a notification that the view is obscured.
Due to optimization techniques in Faster Objects - More Objects (FoMo) determining relative sizes of the icicles is not feasible. As even icicles with small mass can be harmful at moderate elevation this is not a crucial feature.
The object detection model has not been trained to give an exact number of icicles in view. This has no practical implication other than the model verification results appearing worse than practical performance.
Icicles can appear bent or angled either due to wind or more commonly due to ice and snow masses slowly dropping over roof edges. The dataset generated in this project does not cover this, but it would not take a lot of effort to extend the domain randomization to rotate or warp the icicles.
The training images could benefit from simulating snow with particle effects in Omniverse. The project could also be extended to detect build-up of snow on roofs. For inspiration check out this demo of simulated snow dynamic made in 2014 by Walt Disney Animation Studios for the movie Frozen:
To be able to compile a representation of our neural network and have it run on the severely limited amount of RAM available on the Arduino Portenta H7, pixel representation has been limited to a single channel - grayscale. Colors are not needed to detect icicles so this does not affect the results.
In the following drawing we see how equipment and disposable materials are typically organized during surgery. Tools are pre-packaged in sets for the appropriate type of surgery and noted when organized on trays or tables. Swabs are packaged in numbers and contain tags that are noted and kept safe. When swabs are used they are displayed individually in transparent pockets on a stand so they can be counted and checked with the tags from the originating package. Extensive routines are in place to continuously count all equipment used; still errors occur .
with
is a novel machine learning algorithm that allows for visual object detection on highly constrained devices through training of a neural network with a number of convolutional layers.
This model worked great and can be inspected .
Install .
Install
The objects we want to be able to detect need to be represented with a 3D model and a surface (material). Omniverse provides a library of ready-to-import assets, further models can be created using editors such as Blender or purchased on sites such as .
For the chrome surfaces a material from one of the models from the library provided through Omniverse was reused, look for in the Omniverse Asset Store. Remember to switch to "RTX - Interactive" rendering mode to see representative ray-tracing results, "RTX - Real-Time" is a simplified rendering pipeline.
This part describes how to write a script in Python for randomizing the images we will produce. We could choose to start with an empty stage and programatically load models (from USD-files), lights, cameras and such. With a limited number of models and lights we will proceed with adding most items to the stage manually as described earlier. Our script can be named anything, ending in .py and preferably placed close to the stage USD-file. The following is a description of :
Edge Impulse Studio supports a wide range of image labeling formats for object detection. Unfortunately the output from Replicator's BasicWriter needs to be transformed so it can be uploaded either through the web interface or via .
Provided is a simple Python program, . A simple prompt was written for ChatGPT describing the output from Replicator and the . Run the program from shell with
Look at the or .
Note that EI has created a nice for uploading images from Replicator directly to your project. As of the time of writing this extension only uploads images, but this might include label data in the near future.
Finally we can . Target device should be set and we need to remember that in the case of Arduino Nicla Vision we only have enough RAM for 96x96 pixels. Any type of "early stop" feature would be nice, but for now we need to experiment with the number of training cycles. Data augmentation should be avoided in the case where we generate thousands of images, it will not improve our results.
A trained model can be compiled into an Arduino-compatible library. Events can trigger inference and the Arduino Nicla Vision can broadcast any number of detected objects via a Bluetooth LE service. A BLE dongle or smart phone can listen for events and route them to a web-API for further integration with other systems. For instance this application can log the detected items in an Electronic Medical Record system. The e-health standard . Sandbox environments such as are great places to start experimenting with integrations with hospital systems.
Further reading:
I work with research and innovation at , exploring the future of medical technology. I am a member of Edge Impulse Expert Network. This project was made on my own accord and the views are my own.
We are using for the robotic manipulation.
For a depth camera, we will be utilizing the , which will be doing object recognition and localization. An object detection model trained using the Edge Impulse Studio will be deployed directly on the OAK-D camera.
We can use the to install the Raspberry Pi OS (64-bit, Bookworm) on an SD card. The Raspberry Pi Imager allows for easy setup of user accounts, Wi-Fi credentials, and SSH server.
First, we need to define a visual model of the Arduino Braccio ++ using the URDF (Unified Robot Description Format) which is a file format for specifying the geometry and organization of robots in ROS 2. We will be using the STL files for the parts of the robot. We can see one of the STL parts (shoulder) in the following GIF.
Please follow the instructions to download and install the Arduino IDE. After installation, open the Arduino IDE and install the board package for the Arduino Mbed OS Nano Boards by going to Tools > Board > Boards Manager. Search the board package as shown below and install it.
After completing the board package installation, choose the Arduino Nano RP2040 Connect from Tools > Board > Arduino Mbed OS Nano boards menu. We must install (1.3.2) and (humble) libraries. The firmware sketch can be found in the GitHub repository:
.
The pick_n_place
node plans a pick and place operation using . MoveIt Task Constructor provides a way to plan for tasks that consist of multiple different subtasks (known as stages as shown in the image below).
We'll use the along with Edge Impulse to accomplish this task. The Avnet RZBoard V2L is a compact single board computer powered by a Renesas RZ/V2L SoC, running a Linux operating system. It has 2gb of RAM, 32gb of onboard eMMC storage, an SD Card slot, micro-HDMI display output, an ethernet port, built-in WiFi and Bluetooth connectivity, USB ports, and a 40-pin GPIO header for expansion. It's powered by a single 5V/3A USB-C power supply.
You will also need Node Js v14.x to be able to use the . Install it by running these commands:
Finally, let's install the , you just need to run these commands:
First, we need to create an account if we haven't yet, and create a new project:
PoseNet processing block is just enabled for Enterprise projects, if we want to use it on a Developer one, we need to locally run the block, for this, you must clone the and follow the README steps.
You will end up with an URL similar to "https://abe7-2001-1308-a2ca-4f00-e65f-1ff-fe27-d3aa.ngrok-free.app" hosting the processing block, click on Add a processing block > Add custom block, then paste the generated URL, and click on Add block.
To be able to run the project, we need to go back to our SSH connection with the device and clone the project from the , for this, use the following command:
For the actual appliance control, I used the Google Assistant SDK integration for Home Assistant. Follow the to configure it for your setup.
Option 2 provides more flexibility but adds in an additional manual step. For the purpose of this guide, we going with the simplest approach using automatic Engine creation with a no-code deployment. If you wish to manually create an Engine, refer to the documentation for the trtexec
command at .
These are available natively in the Studio, or if those don't meet your needs then it is possible for you to using custom learning blocks, though that will require additional effort.
Training of the model is then done in the same way as other object detection projects in Edge Impulse. Documentation is located here:
The included repo contains an implementation of a custom output parser that works with Edge Impulse's YOLO, which is based on .
After experimenting with computer vision on the Jetson Nano , I believe that a computer vision system with its object detection capabilities can explore its potential to accurately count small objects in large quantities and on fast-moving conveyor belts. Basically, we'll explore the capability of that have been optimized for the GPU in the Jetson Nano. In this project, the production line / conveyor belt will run quite fast, with lots of small objects in random positions, and the number of objects will be counted live and displayed on a 16x2 LCD display. Speed and accuracy are the goals of the project.
On the Jetson Nano, there are several things that need to be done to get ready for our project. Make sure the device is running it's native Ubuntu OS and JetPack which are usually pre-installed on the SD card. More information on . Then ssh
via a PC or laptop with Ethernet and setup Edge Impulse firmware in the terminal:
Before we start with Python, we need to install the Edge Impulse Python SDK and clone the repository from the previous Edge Impulse examples. Follow the steps here, .
With the impressive performance of live inferencing in the Runner, now we will create a Python program to be able to calculate the cumulative count of moving objects taken from camera capture. The program is a modification of Edge Impulse's classify.py
in examples/image
from the linux-python-sdk directory
. We turned it into an object tracking program by solving a bipartite matching problem so the same object can be tracked across different frames to avoid double counting. For more detail, you can download and check the python program at this link,
is a versatile collection of APIs designed to empower researchers and enterprise developers in generating synthetic data that closely resembles real-world scenarios. With its extensibility, Omniverse™ Replicator allows users to effortlessly construct custom synthetic data generation (SDG) tools, effectively expediting the training of computer vision networks.
For this project an is required. I was lucky enough to be given access by NVIDIA to a Windows 10 VM equipped with an (A very big thank you to Liz, Sunny, and all involved). This project can be run on an RTX 3060 and up, if you do not have access to your own RTX-enabled GPU, there are some well known cloud service providers that offer NVIDIA RTX GPUs in the cloud.
We will deploy our machine learning model to an .
To get started with NVIDIA Omniverse™, head over to the official . Once you have signed in you will be able to download the Omniverse™ launcher for Windows or Linux. Once downloaded, run the launcher and go through the settings options.
We are going to use to create our dataset.
For more information about using physics with Replicator, you can check out the .
For more information about using lights with Replicator, you can check out the .
For more information about using distributions with Replicator, you can check out the .
For more information about using cameras with Replicator, you can check out the .
For more information about using writers with Replicator, you can check out the .
Now it is time to head over to and create our machine learning pipeline.
You need to install the required dependencies that will allow you to connect your device to the Edge Impulse platform. This process is documented on the and includes:
The EK-RA8D1 is an officially supported target in Edge Impulse, which means it can be used to collect data directly into Edge Impulse. Follow to enable the EK-RA8D1 to connect to a project.
To get started, create a project and be sure to use an Enterprise or Professional Plan as the Nvidia TAO training pipeline requires either a Professional or Enterprise subscription. For more info on the options, .
Go to , login or create an account, then create a new project.
For Developer (free) accounts, next click on the Labelling queue tab, then start dragging a box around each object and label it, then save it. Repeat until all images are labelled. More information on this process, is located here:
Because we'll use Python, we need to install the Edge Impulse Python SDK and clone the repository with the Edge Impulse provided examples. Follow the steps here to install the Python SDK. Once the SDK is installed, be sure to git clone https://github.com/edgeimpulse/linux-sdk-python
as well, so that you have the samples locally.
My Python program (topping.py
) can be downloaded at this link:
The portable device created in this project monitors buildings and warns the responsible parties when potentially hazardous icicles are formed. In ideal conditions, icicles can form at a rate of . In cold climates, many people are injured and killed each year by these solid projectiles, leading responsible building owners to often close sidewalks in the spring to minimize risk. This project demonstrates how an extra set of digital eyes can notify property owners icicles are forming and need to be removed before they can cause harm.
with
with
Project and .
is a novel machine learning algorithm that allows for visual object detection on highly constrained devices through training of a neural network with a number of convolutional layers.
NVIDIA Omniverse Code is an IDE that allows us to compose 3D scenes and to write simple Python code to capture images. Further, the Replicator extension is a toolkit that allows us to label the objects in the images and to simplify common domain randomization tasks, such as scattering objects between images. For an in-depth walkthrough on getting started with Omniverse and Replicator, .
It's possible to create an empty scene in Omniverse and add content programmatically. However, composing initial objects by hand serves as a practical starting point. In this project was used as a basis.
To represent the icicle, a high quality model pack was purchased at .
To be able to import the models into Omniverse and Isaac Sim, all models have to be converted to . While USD is a great emerging standard for describing, composing, simulating, and collaborting within 3D worlds, it is not yet commonly supported in asset marketplaces. outlines considerations when performing conversion using Blender to USD. Note that it is advisable to export each individual model and to choose a suitable origin/pivot point.
With a basic 3D stage created and objects of interest labeled, we can continue creating a program that will make sure we produce images with slight variations. Our program can be named anything, ending in .py
and preferably placed close to the stage USD-file. Here is a sample of such a program: :
With a basic randomization program in place, we could run it from the embedded script editor (Window > Script Editor), but more robust Python language support can be achieved by developing in Visual Studio Code instead. To connect VS Code with Omniverse we can use the Visual Studio Code extension . See the for setup. When ready to run go to Replicator > Start and check progress in the defined output folder.
These are rather unsophisticated approaches. More realistic results would be achieved by changing the of the actual walls of the house used as background. Omniverse has a large selection of available materials available in the NVIDIA Assets browser, allowing us to randomize a of the rendered results.
In contrast to a controlled indoor environment, creating a robust object detection model intended for outdoor use needs training images with a wide range of realistic natural light. When generating synthetic images we can utilize an based on sun studies.
The extension let's us set world location, date and time. We can also mix this with the Environment setting in Omniverse, allowing for a wide range of simulation of clouds. As of March 2024 it is not easy to randomize these parameters in script, but this . In the mean time we can set the parameters, generate a few thousand images, change time of day, generate more images and so on.
Edge Impulse Studio supports a wide range of image labeling formats for object detection. The output from Replicator's BasicWriter needs to be transformed so it can be uploaded either through the web interface or via .
Provided is a simple Python program, to help get started. Documentation on the supported Edge Impulse . Run the program from a terminal with:
Look at the or .
Since we have generated both synthetic images and labels, we can use the to efficiently upload both. Use:
Note that the final results include 5000 images from the . Adding this reduces F1 score a bit, but results in a model with significantly less overfitting, that shows almost no false positives when classifying random background scenes.
We can get useful information about model performance with minimal effort by testing it in a virtual environment. Install and the .
Follow on how to flash the device and to modify the ei_object_detection.py
code. Remember to change: sensor.set_pixformat(sensor.GRAYSCALE)
. The file edge_impulse_firmware_arduino_portenta.bin
is our firmware for the Arduino Portenta H7 with Vision shield.
Using The Things Stack sandbox (formerly known as The Things Network) we can create a low-power sensor network that allows transmitting device data with minimal energy consumption, long range, and no network fees. Your area may already be covered by a crowd funded network, or you can gateway. is really fun!
Following the on the topic, we create an application in The Things Stack sandbox and register our first device.
Next we will simplify things by merging an example Arduino sketch for transmitting a LoRaWAN message, with the Edge Impulse generated object detection model code. Open the example sketch called LoraSendAndReceive
included with the MKRWAN(v2) library mentioned in the . There is an example of this for you in the , where you can find an Arduino sketch with the merged code.
There are a few things to consider in the implementation: The device should enter deep sleep mode and disable/put to sleep all periferals between object detection runs. Default operation of the Portenta H7 with the Vision shield consumes a lot of energy and will drain a battery quickly. To find out how much energy is consumed we can use a device such as the . Hook up the positive power supply to VIN, negative to GND. Since VIN bypasses the Portenta power regulator we should provide 5V, however in my setup the Otii Arc is limited to 4.55V. Luckily it seems to be sufficient and we can take some measurements. By connecting the Otii Arc pin RX to the Portenta pin D14/PA9/UART1 TX, in code we can write debug messages to Serial1. This is incredibly helpful in determining what power consumption is associated with what part of the code.
As we can see the highlighted section should be optimized for minimal power consumption. This is a complicated subject, especially on a but there are some examples for general guidance:
.
The project code presented here runs inference on an image every 10 seconds. However, this is for demonstration purposes and in a deployment should be much less frequent, like once per hour during daylight. Have a look at this project for an example of how to via LoRaWAN downlink message. This could be further controlled automatically via an application that has access to an .
Next, in the The Things Stack application we need to define a function that will be used to decode the byte into a JSON structure that is easier to interpet when we pass the message further up the chain of services. The function can be found in the .
An integral part of The Things Stack is an MQTT message broker. At this point we can use and create any suitable notification system for the end user. The following is an MQTT client written in Python to demonstrate the principle. Note that the library paho-mqtt
has been used in a way so that it will block the program execution until two messages have been received. Then it will print the topic and payloads. In a real implementation, it would be better to register a callback and perform some action for each message received.
TTS has a range of for specific platforms, or you could set up a mechanism.
For permanent outdoor installation the device requires a properly sealed enclosure. The camera is mounted on the shield PCB and will need some engineering to be able to see through the enclosure while remaining water tight. For inspiration on how to create weather-proof enclosures that allow sensors and antennas outside access, on friction fitting and use of rubber washers. The referenced project also proves that battery operated sensors can work with no noticeable degradation in winter conditions (to at least -15 degrees Celcius).
Insights into .