Tracking Rooftop ice buildup detection using Edge Impulse and The Things Network, with synthetic data created in NVIDIA Omniverse Replicator and Sun Studies.
Created By: Eivind Holt
Public Project Link: https://studio.edgeimpulse.com/public/332581/live
GitHub Repo: https://github.com/eivholt/icicle-monitor
The portable device created in this project monitors buildings and warns the responsible parties when potentially hazardous icicles are formed. In ideal conditions, icicles can form at a rate of more than 1 cm (0.39 in) per minute. In cold climates, many people are injured and killed each year by these solid projectiles, leading responsible building owners to often close sidewalks in the spring to minimize risk. This project demonstrates how an extra set of digital eyes can notify property owners icicles are forming and need to be removed before they can cause harm.
NVIDIA GeForce RTX
Project Impulse and Github code repository.
Icicle formation is detected using a neural network (NN) designed to identify objects in images from the onboard camera. The NN is trained and tested exclusively on synthesized images. The images are generated with realistic simulated lighting conditions. A small amount of real images are used to later verify the model.
The main challenge of detecting forming icicles is the translucent nature of ice and natural variation of sunlight. Because of this we need a great number of images to train a model that captures enough features of the ice with varying lighting conditions. Capturing and annotating such a large dataset is incredibly labor intensive. We can mitigate this problem by synthesizing images with varying lighting conditions in a realistic manner and have the objects of interest automatically labeled.
A powerful platform combined with a high resolution camera with fish-eye lens would increase the ability to detect icicles. However, by deploying the object detection model to a small, power-efficient, but highly constrained device, options for device installation increase. Properly protected against moisture this device can be mounted outdoors on walls or poles facing the roofs in question. LoRaWAN communication enables low battery consumption and long transmission range.
FOMO (Faster Objects, More Objects) is a novel machine learning algorithm that allows for visual object detection on highly constrained devices through training of a neural network with a number of convolutional layers.
One of the most labor intensive aspects of building any machine learning model is gathering the training data and labeling it. For an object detection model this requires taking hundreds or thousands of images of the objects to detect, drawing rectangles around them, and choosing the correct label for each class. Recently generating pre-labeled images has become feasible and has proven to have great results. This is referred to as synthetic data generation with domain randomization. In this project a model will be trained exclusively on synthetic data, and we will see how it can detect the real life counterparts.
NVIDIA Omniverse Code is an IDE that allows us to compose 3D scenes and to write simple Python code to capture images. Further, the Replicator extension is a toolkit that allows us to label the objects in the images and to simplify common domain randomization tasks, such as scattering objects between images. For an in-depth walkthrough on getting started with Omniverse and Replicator, see this associated article.
It's possible to create an empty scene in Omniverse and add content programmatically. However, composing initial objects by hand serves as a practical starting point. In this project a royalty free 3D model of a house was used as a basis.
To represent the icicle, a high quality model pack was purchased at Turbo Squid.
To be able to import the models into Omniverse and Isaac Sim, all models have to be converted to OpenUSD-format. While USD is a great emerging standard for describing, composing, simulating, and collaborting within 3D worlds, it is not yet commonly supported in asset marketplaces. This article outlines considerations when performing conversion using Blender to USD. Note that it is advisable to export each individual model and to choose a suitable origin/pivot point.
Blender change origin cheat sheet:
Select vertex on model (Edit Mode), Shift+S-> Cursor to selected
(Object Mode) Select Hierarchy, Object>Set Origin\Origin to 3D Cursor
(Object Mode) Shift+S\Cursor to World Origin
Tip for export:
Selection only
Convert Orientation:
Forward Axis: X
Up Axis: Y
To be able to produce images for training and include labels, we can use a feature of Replicator toolbox found under menu Replicator > Semantics Schema Editor.
Here we can select each top node representing an item for object detection and add a key-value pair. Choosing "class" as Semantic Type and "ice" as Semantic Data enables us to export this string as a label later.
With a basic 3D stage created and objects of interest labeled, we can continue creating a program that will make sure we produce images with slight variations. Our program can be named anything, ending in .py
and preferably placed close to the stage USD-file. Here is a sample of such a program: replicator_init.py:
To keep the items generated in our script separate from the manually created content, we start by creating a new layer in the 3D stage:
Next we specify that we want to use ray-tracing as our image output. We create a camera and hard code the position. We will point it to our icicles for each render later. Then we use our previously defined semantics data to get references to the icicles for easier manipulation. We also define references to a plane on which we want to scatter the icicles. Lastly we define our render output by selecting the camera and setting the desired resolution. Due to an issue in Omniverse where artifacts are produces at certain resolutions, e.g. 120x120 pixels, we set the output resolution at 128x128 pixels. Edge Impulse Studio will take care of scaling the images to the desired size should we use images of different size than the configured model size.
Due to the asynchronous nature of Replicator we need to define our randomization logic as call-back methods by first registering them in the following fashion:
Before defining the logic of the randomization methods we define what will happen during each render:
The parameter num_frames specifies the desired number of renders. The rt_subframes parameter allows the rendering process to advance a set number of frames before the result is captured and saved to disk. A higher setting enhances complex ray tracing effects like reflections and translucency by giving them more time to interact across surfaces, though it increases rendering time. Each randomization routine is invoked with the option to include specific parameters.
To save each image and its corresponding semantic data, we utilize a designated API. While customizing the writer was considered, attempts to do so using Replicator version 1.9.8 on Windows led to errors. Therefore, we are employing the "BasicWriter" and will develop an independent script to generate a label format that is compatible with Edge Impulse.
rgb indicates that we want to save images to disk as .png
files. Note that labels are created setting bounding_box_2d_loose. This is used in this case instead of bounding_box_2d_tight as the latter in some cases would not include the tip of the icicles in the resulting bounding box. It also creates labels from the previously defined semantics. The code ends with running a single iteration of the process in Omniverse Code, so we can preview the results.
The bounding boxes can be visualized by clicking the sensor widget, checking "BoundingBox2DLoose" and finally "Show Window".
Now we can implement the randomization logic. First we'll use a method that flips and scatters the icicles on a defined plane.
Next a method that randomly places the camera on another defined plane, and makes sure the camera is pointing at the group of icicles and randomizes focus.
We can define the methods in any order we like, but in rep.trigger.on_frame it is crucial that the icicles are placed before pointing the camera.
With a basic randomization program in place, we could run it from the embedded script editor (Window > Script Editor), but more robust Python language support can be achieved by developing in Visual Studio Code instead. To connect VS Code with Omniverse we can use the Visual Studio Code extension Embedded VS Code for NVIDIA Omniverse. See the extension repo for setup. When ready to run go to Replicator > Start and check progress in the defined output folder.
The surface behind the icicles may vary greatly, both in color and texture. Using Replicator randomizing the color of an object's material is easy.
In the scene in Omniverse, either manually create a plane behind the icicles, or create one programmatically.
In Code, define a function that takes in a reference to the plane we want to randomize, the color of the distribution functions with min and max value span:
Then get a reference to the plane:
Lastly register the function and trigger it on each new frame:
Now each image will have a background with random (deterministic, same starting seed) RGB color. Replicator takes care of creating a material with a shader for us. As you might remember, in an effort to reduce RAM usage our neural network reduces RGB color channels to grayscale. In this project we could simplify the color randomization to only pick grayscale colors. The example has been included as it would benefit in projects where color information is not reduced. To only randomize in grayscale, we could change the code in the randomization function to use the same value for R, G and B as follows:
To further steer training of the object detection model in capturing features of the desired class, the icicles, and not features that appear due to short commings in the domain randomization, we can create images with the icicles in front of a large variety of background images. A simple way of achieving this is to use a large dataset of random images and randomly assigning one of them to a background plane for each image generated.
We could instead generate textures with random shapes and colors. Either way, the resulting renders will look weird, but help the model training process weight features that are relevant for the icicles, not the background.
These are rather unsophisticated approaches. More realistic results would be achieved by changing the materials of the actual walls of the house used as background. Omniverse has a large selection of available materials available in the NVIDIA Assets browser, allowing us to randomize a much wider range of aspects of the rendered results.
In contrast to a controlled indoor environment, creating a robust object detection model intended for outdoor use needs training images with a wide range of realistic natural light. When generating synthetic images we can utilize an extension that approximates real world sunlight based on sun studies.
The extension let's us set world location, date and time. We can also mix this with the Environment setting in Omniverse, allowing for a wide range of simulation of clouds. As of March 2024 it is not easy to randomize these parameters in script, but this is likely to change. In the mean time we can set the parameters, generate a few thousand images, change time of day, generate more images and so on.
Edge Impulse Studio supports a wide range of image labeling formats for object detection. The output from Replicator's BasicWriter needs to be transformed so it can be uploaded either through the web interface or via the Ingestion API.
Provided is a simple Python program, basic_writer_to_pascal_voc.py to help get started. Documentation on the supported Edge Impulse label formats is located here. Run the program from a terminal with:
or debug from Visual Studio Code by setting input folder in launch.json
like this:
This will create a file bounding_boxes.labels
that contains all labels and bounding boxes per image.
Look at the provided object detection Edge Impulse project or follow a guide to create a new FOMO project.
Since we have generated both synthetic images and labels, we can use the CLI tool from Edge Impulse to efficiently upload both. Use:
to connect to your account and project, and upload the image files and labels in bounding_boxes.labels
. To switch project if necessary, first run:
At any time we can find "Perform train/test split" under "Danger zone" in project dashboard, to distribute images between training/testing in a 80/20 split.
Since our synthetic training images are based on both individual and two different sized clusters of icicles, we can't trust the model performance numbers too much. Greater F1 scores are better, but we will never achieve 100%. Still, we can upload increasing numbers of labeled images and observe how performance numbers increase.
2,000 images:
6,000 images:
14,000 images:
26,000 images:
Note that the final results include 5000 images from the COCO 2017 dataset. Adding this reduces F1 score a bit, but results in a model with significantly less overfitting, that shows almost no false positives when classifying random background scenes.
If we look at results from model testing in Edge Impulse Studio, at first glance the numbers are less than impressive.
However if we investigate individual samples where F1 score is less than 100%, we see that the model indeed has detected the icicles, but clustered differently than how the image was originally labeled. What we should look out for are samples that contain visible icicles where none were detected.
In the end virtual and real-life testing tells us how well the model really performs.
We can get useful information about model performance with minimal effort by testing it in a virtual environment. Install NVIDIA Isaac Sim and the Edge Impulse extension.
Install the Sun Study extension in Isaac Sim to be able to vary light conditions while testing.
Paste your API key found in the Edge Impulse Studio > Dashboard > Keys > Add new API key into Omniverse Extension:
To be able to classify any virtual camera capture we first need to build a version of the model that can run in a JavaScript environment. In Edge Impulse Studio, go to Deployment, find "WebAssembly" in the search box and click Build. We don't need to keep the resulting .zip package, the extension will find and download it by itself in a moment.
Back in the Edge Impulse extension in Isaac Sim, when we expand the "Classification" group, a message will tell us everything is ready: "Your model is ready! You can now run inference on the current scene".
Before we test it we will make some accommodations in the viewport.
Switch to "RTX - Interactive" to make sure the scene is rendered realistically.
Set viewport resolution to square 1:1 with either the same resolution as our intended device inference (120x120 pixels), or (512x512 pixels).
Display Isaac bounding boxes by selecting "BoundingBox2DLoose" under the icon that resembles a robotic sensor, then click "Show Window". Now we can compare the ground truth with model prediction.
To get visual verification our model works as intended we can go to Deployment in Edge Impulse Studio, select OpenMV Firmware as target and build.
Follow the documentation on how to flash the device and to modify the ei_object_detection.py
code. Remember to change: sensor.set_pixformat(sensor.GRAYSCALE)
. The file edge_impulse_firmware_arduino_portenta.bin
is our firmware for the Arduino Portenta H7 with Vision shield.
Start by selecting Arduino library as a Deployment target.
Once built and downloaded, open Arduino IDE, go to Sketch > Include Library > Add .zip Library ... and locate the downloaded library. Next go to File > Examples > [name of project]_inferencing > portenta_h7 > portenta_h7_camera to open a generic sketch template using our model. To test the model continuously and print the results to console this sketch is ready to go. The code might appear daunting, but we really only need to focus on the loop()
function.
Using The Things Stack sandbox (formerly known as The Things Network) we can create a low-power sensor network that allows transmitting device data with minimal energy consumption, long range, and no network fees. Your area may already be covered by a crowd funded network, or you can create your own gateway. Getting started with LoRaWAN is really fun!
Following the Arduino guide on the topic, we create an application in The Things Stack sandbox and register our first device.
Next we will simplify things by merging an example Arduino sketch for transmitting a LoRaWAN message, with the Edge Impulse generated object detection model code. Open the example sketch called LoraSendAndReceive
included with the MKRWAN(v2) library mentioned in the Arduino guide. There is an example of this for you in the project code repository, where you can find an Arduino sketch with the merged code.
In short, we perform inference every 10 seconds. If any icicles are detected we simply transmit a binary 1
to the The Things Stack application. It is probably obvious that the binary payload is redundant, the presence of a message is enough, but this could be extended to transmit other data, for example the prediction confidence, number of clusters, battery level, temperature or light level.
There are a few things to consider in the implementation: The device should enter deep sleep mode and disable/put to sleep all periferals between object detection runs. Default operation of the Portenta H7 with the Vision shield consumes a lot of energy and will drain a battery quickly. To find out how much energy is consumed we can use a device such as the Otii Arc from Qoitech. Hook up the positive power supply to VIN, negative to GND. Since VIN bypasses the Portenta power regulator we should provide 5V, however in my setup the Otii Arc is limited to 4.55V. Luckily it seems to be sufficient and we can take some measurements. By connecting the Otii Arc pin RX to the Portenta pin D14/PA9/UART1 TX, in code we can write debug messages to Serial1. This is incredibly helpful in determining what power consumption is associated with what part of the code.
As we can see the highlighted section should be optimized for minimal power consumption. This is a complicated subject, especially on a complex board such as the Arduino Portenta H7 but there are some examples for general guidance:
The project code presented here runs inference on an image every 10 seconds. However, this is for demonstration purposes and in a deployment should be much less frequent, like once per hour during daylight. Have a look at this project for an example of how to remotely control inference interval via LoRaWAN downlink message. This could be further controlled automatically via an application that has access to an API for daylight data.
Next, in the The Things Stack application we need to define a function that will be used to decode the byte into a JSON structure that is easier to interpet when we pass the message further up the chain of services. The function can be found in the project code repository.
Now we can observe messages being received and decoded in Live data in the TTS console.
An integral part of The Things Stack is an MQTT message broker. At this point we can use any MQTT client to subscribe to topics and create any suitable notification system for the end user. The following is an MQTT client written in Python to demonstrate the principle. Note that the library paho-mqtt
has been used in a way so that it will block the program execution until two messages have been received. Then it will print the topic and payloads. In a real implementation, it would be better to register a callback and perform some action for each message received.
Observe the difference in the real uplink (first) and simulated uplink (second). In both we find "decoded_payload":{"detected":true}.
TTS has a range of integration options for specific platforms, or you could set up a custom webhook using a standard HTTP/REST mechanism.
For permanent outdoor installation the device requires a properly sealed enclosure. The camera is mounted on the shield PCB and will need some engineering to be able to see through the enclosure while remaining water tight. For inspiration on how to create weather-proof enclosures that allow sensors and antennas outside access, see this project on friction fitting and use of rubber washers. The referenced project also proves that battery operated sensors can work with no noticeable degradation in winter conditions (to at least -15 degrees Celcius).
The project has no safe-guard against false negatives. The device will not report if it's view is blocked. This could be resolved by placing static markers on both sides of an area to monitor and included in synthetic training data. Absence of at least one marker could trigger a notification that the view is obscured.
Due to optimization techniques in Faster Objects - More Objects (FoMo) determining relative sizes of the icicles is not feasible. As even icicles with small mass can be harmful at moderate elevation this is not a crucial feature.
The object detection model has not been trained to give an exact number of icicles in view. This has no practical implication other than the model verification results appearing worse than practical performance.
Icicles can appear bent or angled either due to wind or more commonly due to ice and snow masses slowly dropping over roof edges. The dataset generated in this project does not cover this, but it would not take a lot of effort to extend the domain randomization to rotate or warp the icicles.
The training images could benefit from simulating snow with particle effects in Omniverse. The project could also be extended to detect build-up of snow on roofs. For inspiration check out this demo of simulated snow dynamic made in 2014 by Walt Disney Animation Studios for the movie Frozen:
To be able to compile a representation of our neural network and have it run on the severely limited amount of RAM available on the Arduino Portenta H7, pixel representation has been limited to a single channel - grayscale. Colors are not needed to detect icicles so this does not affect the results.
Insights into how icicles are formed.