In this tutorial, we will explore how to label image data using GPT-4o, a powerful language model developed by OpenAI. GPT-4o is capable of generating accurate and meaningful labels for images, making it a valuable tool for image classification tasks. By leveraging the capabilities of GPT-4o, we can automate the process of labeling image data, saving time and effort in data preprocessing.We packaged in a “pre-built Transformation block” (available for all Enterprise plans), an innovative method to distill LLM knowledge.This pre-built transformation block can be found under the Data sources tab in the Data acquisition view.The block takes all your unlabeled image files and asks GPT-4o to label them based on your prompt - and we automatically add the reasoning as metadata to your items!Your prompt should return a single label, e.g.
Copy
Is there a person in this picture? Answer with just 'yes' or 'no'.
Navigate to the Data acquisition page and add images to your project’s dataset.
In the video tutorial above, we show how to collect a video recorded directly from a phone, upload it to Edge Impulse and split the video into individual frames.
OpenAI API key: Add your OpenAI API key. This value will be stored as a secret, and won’t be shown again.
Prompt: Your prompt should return a single label. For example:
Copy
Is there a person in this picture? Respond only with "yes", "no" or "unsure" if you're not sure.
Disable samples w/ label: If a certain label is output, disable the data item - these are excluded from training. Multiple labels are accepted, separate them with a coma.
Max. no. of samples to label: Number of samples to label.
Concurrency: Number of samples to label in parallel.
Auto-convert videos: If set, all videos are automatically split into individual images before labeling.
The small model we tested this on performed exceptionally well, identifying toys in various scenes quickly and accurately. By distilling knowledge from the large LLM, we created a specialized, efficient model suitable for edge deployment.
The latest multimodal LLMs are incredibly powerful but too large for many practical applications. At Edge Impulse, we enable the transfer of knowledge from these large models to smaller, specialized models that run efficiently on edge devices.Our “Label image data using GPT-4o” block is available for enterprise customers, allowing you to experiment with this technology.For further assistance, visit our forum.