- A single-label: each image has a single label.
- Bounding boxes: used for object detection; images contain ‘objects’ to be detected, given as a list of labeled ‘bounding boxes’.
Edge Impulse object detection format
The Edge Impulse object detection acquisition format provides a simple and intuitive way to store images and associated bounding box labels. Folders containing data in this format will take the following structure:bounding_boxes.labels
file.
The bounding_boxes.labels
file in each subdirectory provides detailed information about the labeled objects and their corresponding bounding boxes. The file follows a JSON format, with the following structure:
version
: Indicates the version of the label format.files
: A list of objects, where each object represents an image and its associated labels.path
: The path or file name of the image.category
: Indicates whether the image belongs to the training or testing set.- (optional)
label
: Provides information about the labeled objects.type
: Specifies the type of label (e.g., a single label).label
: The actual label or class name of the object.
- (Optional)
metadata
: Additional metadata associated with the image, such as the site where it was collected, the timestamp or any useful information. boundingBoxes
: A list of objects, where each object represents a bounding box for an object within the image.label
: The label or class name of the object within the bounding box.x
,y
: The coordinates of the top-left corner of the bounding box.width
,height
: The width and height of the bounding box.
bounding_boxes.labels
example:
COCO JSON
The COCO JSON (Common Objects in Context JSON) format is a widely used standard for representing object detection datasets. It provides a structured way to store information about labeled objects, their bounding boxes, and additional metadata. A COCO JSON dataset can follow this directory structure:_annotations.coco.json
file in each subdirectory provides detailed information about the labeled objects and their corresponding bounding boxes. The file follows a JSON format, with the following structure:
Categories
The “categories” component defines the labels or classes of objects present in the dataset. Each category is represented by a dictionary containing the following fields:
id
: A unique integer identifier for the category.name
: The name or label of the category.- (Optional)
supercategory
: A higher-level category that the current category belongs to, if applicable. Thissupercategory
is not used or imported by the Uploader.
id
: A unique integer identifier for the image.width
: The width of the image in pixels.height
: The height of the image in pixels.file_name
: The file name or path of the image file.
id
: A unique integer identifier for the annotation.image_id
: The identifier of the image to which the annotation belongs.category_id
: The identifier of the category that the annotation represents.bbox
: A list representing the bounding box coordinates in the format [x, y, width, height].- (Optional)
area
: The area (in pixels) occupied by the annotated object. - (Optional)
segmentation
: The segmentation mask of the object, represented as a list of polygons. - (Optional)
iscrowd
: A flag indicating whether the annotated object is a crowd or group of objects.
area
, segmentation
, iscrowd
fields.
_annotations.coco.json
example:
Open Images CSV
The OpenImage dataset provides object detection annotations in CSV format. The_annotations.csv
file is located in the same directory of the images it references. A class-descriptions.csv
mapping file can be used to give short description or human-readable classes from the MID LabelName
.
An OpenImage CSV dataset usually has this directory structure:
- Each line in the CSV file represents an object annotation.
- The values in each line are separated by commas.
- The CSV file typically includes several columns, each representing different attributes of the object annotations.
- The common columns found in the OpenImage CSV dataset include:
ImageID
: An identifier or filename for the image to which the annotation belongs.Source
: The source or origin of the annotation, indicating whether it was manually annotated or obtained from other sources.LabelName
: The class label of the object.Confidence
: The confidence score or probability associated with the annotation.XMin, YMin, XMax, YMax
: The coordinates of the bounding box that encloses the object, usually represented as the top-left (XMin, YMin) and bottom-right (XMax, YMax) corners.IsOccluded, IsTruncated, IsGroupOf, IsDepiction, IsInside
: Binary flags indicating whether the object is occluded, truncated, a group of objects, a depiction, or inside another object.
- Each object in the dataset is associated with a class label.
- The class labels in the OpenImage dataset are represented as
LabelName
in the CSV file. - The
LabelName
correspond to specific object categories defined in the OpenImage dataset’s ontology (MID).
class-description.csv
mapping file to see your classes in Edge Impulse Studio.
Bounding Box Coordinates:
- The bounding box coordinates define the normalized location and size of the object within the image.
- The coordinates are represented as the X and Y pixel values for the top-left corner (XMin, YMin) and the bottom-right corner (XMax, YMax) of the bounding box.
class-descriptions.csv
mapping file:
- To be ingested in Edge Impulse the mapping file name must end with
*class-descriptions.csv
- Here is an example of the mapping file: https://github.com/openimages/dataset/blob/main/dict.csv
_annotations.csv
example:
Pascal VOC XML
The Pascal VOC (Visual Object Classes) format is another widely used standard for object detection datasets. It provides a structured format for storing images and their associated annotations, including bounding box labels. A Pascal VOC dataset can follow this directory structure:- Image files: The dataset includes a collection of image files, usually in JPEG or PNG format. Each image represents a sample in the dataset.
- Annotation files: The annotations for the images are stored in XML files. Each XML file corresponds to an image and contains the annotations for that image, including bounding box labels and class labels.
- Class labels: A predefined set of class labels is defined for the dataset. Each object in the image is assigned a class label, indicating the category or type of the object.
- Bounding box annotations: For each object instance in an image, a bounding box is defined. The bounding box represents the rectangular region enclosing the object. It is specified by the coordinates of the top-left corner, width, and height of the box.
- Additional metadata: Pascal VOC format allows the inclusion of additional metadata for each image or annotation. This can include information like the source of the image, the author, or any other relevant details. The Edge Impulse uploader currently doesn’t import these metadata.
cubes.23im33f2.xml
:
Plain CSV
The Plain CSV format is a very simple format: a CSV annotation file is stored in the same directory as the images. We support both “Single Label” and “Object Detection” labeling methods for this format. An Plain CSV dataset can follow this directory structure:- Each line in the CSV file represents an object annotation.
- The values in each line are separated by commas.
file_name
: The filename of the image.classes
: The class label or category of the image.
_annotations_single_label.csv
example:
file_name
: The filename of the image.classes
: The class label or category of the object.xmin
: The x-coordinate of the top-left corner of the bounding box.ymin
: The y-coordinate of the top-left corner of the bounding box.xmax
: The x-coordinate of the bottom-right corner of the bounding box.ymax
: The y-coordinate of the bottom-right corner of the bounding box.
_annotations_bounding_boxes.csv
example:
YOLO TXT
The YOLO TXT format is a specific text-based annotation format mostly used in conjunction with the YOLO object detection algorithm. This format represents object annotations for an image in a plain text file.-
File Structure:
- Each annotation is represented by a separate text file.
- The text file has the same base name as the corresponding image file.
- The file extension is
.txt
.
-
Annotation Format:
- Each line in the TXT file represents an object annotation.
- Each annotation line contains space-separated values representing different attributes.
- The attributes in each line are ordered as follows:
class_label
, normalized bounding box coordinates (center_x
,center_y
,width
,height
).
-
Class label:
- The class label represents the object category or class.
- The class labels are usually represented as integers, starting from 0 or 1.
- Each class label corresponds to a specific object class defined in the dataset.
-
Normalized Bounding Box Coordinates:
- The bounding box coordinates represent the location and size of the object in the image.
- The coordinates are normalized to the range [0, 1], where (0, 0) represents the top-left corner of the image, and (1, 1) represents the bottom-right corner.
- The normalized bounding box coordinates include the center coordinates (center_x, center_y) of the bounding box and its width and height.
- The center coordinates (center_x, center_y) are relative to the width and height of the image, where (0, 0) represents the top-left corner, and (1, 1) represents the bottom-right corner.
- The width and height are also relative to the image size.
cubes-23im33f2.txt
cubes-23im33f2.jpg
image.
-
Mapping the Class Label:
- The
classes.txt
,classes.names
ordata.yaml
(used by Roboflow YOLOv5 PyTorch export format) files contain configuration values used by the model to locate images and map class names toclass_id
s.
- The
classes.txt
file: