In this tutorial, we will explore how to label image data using GPT-4o, a powerful language model developed by OpenAI. GPT-4o is capable of generating accurate and meaningful labels for images, making it a valuable tool for image classification tasks. By leveraging the capabilities of GPT-4o, we can automate the process of labeling image data, saving time and effort in data preprocessing.
We packaged in a "pre-built Transformation block" (available for all Enterprise plans), an innovative method to distill LLM knowledge.
This pre-built transformation block can be found under the Data sources tab in the Data acquisition view.
The block takes all your unlabeled image files and asks GPT-4o to label them based on your prompt - and we automatically add the reasoning as metadata to your items!
Your prompt should return a single label, e.g.
The GPT-4o model processes images and assigns labels based on the content, filtering out any images that do not meet the quality criteria.
Navigate to the Data acquisition page and add images to your project's dataset. In the video tutorial above, we show how to collect a video recorded directly from a phone, upload it to Edge Impulse and split the video into individual frames.
In the Data sources tab, add the "Label image data using GPT-4o" block:
OpenAI API key: Add your OpenAI API key. This value will be stored as a secret, and won't be shown again.
Prompt: Your prompt should return a single label. For example:
Disable samples w/ label: If a certain label is output, disable the data item - these are excluded from training. Multiple labels are accepted, separate them with a coma.
Max. no. of samples to label: Number of samples to label.
Concurrency: Number of samples to label in parallel.
Auto-convert videos: If set, all videos are automatically split into individual images before labeling.
To edit your configuration, you need to update the json-like steps of your block:
Then, run the block to automatically label the frames.
And here is an example of the returned logs:
Use the labeled data to train a machine learning model. See the end-to-end tutorial Adding sight to your sensors.
In the video tutorial, we deployed the trained model to an MCU-based edge device - the Arduino Nicla Vision.
The small model we tested this on performed exceptionally well, identifying toys in various scenes quickly and accurately. By distilling knowledge from the large LLM, we created a specialized, efficient model suitable for edge deployment.
The latest multimodal LLMs are incredibly powerful but too large for many practical applications. At Edge Impulse, we enable the transfer of knowledge from these large models to smaller, specialized models that run efficiently on edge devices.
Our "Label image data using GPT-4o" block is available for enterprise customers, allowing you to experiment with this technology.
For further assistance, visit our forum.
Blog post: Label image data using GPT-4o blog post