Image classification vs object detection
From input image to heat map (cup in red, lamp in green)
Here's a former iteration of the FOMO approach used to count individual bees (heat map 2x smaller than the input size).
From heat map to bounding boxes
Training on the centroids of beer bottles. On top the source labels, at the bottom the inference result.
Selecting a FOMO model in Edge Impulse
Accessing expert mode
object_weight=100
, as a way of balancing what is usually a majority of background. This value was chosen as a sweet spot for a number of example use cases. In scenarios where the objects to detect are relatively rare this value can be increased, e.g. to 1000, to have the model focus even more on object detection (at the expense of potentially more false detections).
96x96
input results in a 12x12
output). This is implemented by cutting MobileNet off at the intermediate layer block_6_expand_relu
MobileNetV2 cut point
cut_point
results in a different spatial reduction; e.g. if we cut higher at block_3_expand_relu
FOMO will instead only do a spatial reduction of 1/4 (i.e. a 96x96
input results in a 24x24
output)
Note though; this means taking much less of the MobileNet backbone and results in a model with only 1/2 the params. Switching to a higher alpha may counteract this parameter reduction. Later FOMO releases will counter this parameter reduction with a UNet style architecture.
num_classes
outputs.
FOMO uses a convolutional classifier.
Conv2D
layer, 2) adding additional layers or 3) doing both.
For example we might change the number of filters from 32 to 16, as well as adding another convolutional layer, as follows.
Adding an additional layer to the classifier of FOMO
Requirement | Minimum | Recommended |
---|---|---|
Memory Footprint (RAM) | 256 KB 64x64 pixels (B&W, buffer included) | ≥ 512 KB 96x96 pixels (B&W, buffer Included) |
Latency (100% load) | 80 MHz < 1 fps | > 80 MHz + acceleration ~15 fps @ 480MHz 40-60fps in RPi4 |