Synthetic Data integration

The Synthetic Data Integration allows you to easily create and manage synthetic data, enhancing your datasets and improving model performance. Whether you need images, speech, or audio data, our new integrations make it simple and efficient.

There is also a video version demonstrating the Synthetic Data Integration workflow and features:

Supported Blocks

  • DALL-E Image Generation Block: Generate image datasets using Dall·E using the DALL-E model.

  • Whisper Keyword Spotting Generation Block: Generate keyword-spotting datasets using the Whisper model. Ideal for keyword spotting and speech recognition applications.

  • Eleven Labs Sound Generation Block: Generate sound datasets using the Eleven Labs model. Ideal for generating realistic sound effects for various applications.

  • Nvidia Omniverse Block: Generate synthetic data using the Nvidia Omniverse model. Ideal for creating realistic 3D scenes and objects for computer vision applications.

  • Custom Transform Blocks: Create custom transformation blocks to generate synthetic data using your own models or APIs. See below in Custom Transformation Blocks for more information.

To use these features, navigate to Data Sources, add new data source transformation blocks, set up actions, run a pipeline, and then go to Data Acquisition to view the output. If you want to make changes or refine your prompts, you have to delete the pipeline and start over.

Benefits of Synthetic Data Integration Management

  • Enhance Your Datasets: Easily augment your datasets with high-quality synthetic data.

  • Improve Model Accuracy: Synthetic data can help fill gaps in your dataset, leading to better model performance.

  • Save Time and Resources: Quickly generate the data you need without the hassle of manual data collection.

Data ingestion should also include a flag in the header x-synthetic-data-job-id, allowing users to pass an optional new header to indicate this is synthetic data. Read on in the Custom Transformation Block section below for more details.

Accessing the Synthetic Data Integration

To access the Synthetic Data Integration, follow these steps:

  1. Navigate to Your Project: Open your project in Edge Impulse Studio.

  2. Open Synthetic Data Integration Tab: Click on the "Synthetic Data" tab in the left-hand menu.

Generating Synthetic Images with GPT-4 (DALL-E)

  • Create Realistic Images: Use DALL-E to generate realistic images for your datasets.

  • Customize Prompts: Tailor the prompts to generate specific types of images suited to your project needs.

  1. Select Image Generation: Choose the GPT-4 (DALL-E) option.

  2. Enter a Prompt: Describe the type of images you need (e.g., "A photo of a factory worker wearing a hard hat", or some background data for object detection (of cars) "aerial view images of deserted streets").

  3. Generate and Save: Click "Generate" to create the images. Review and save the generated images to your dataset.

Generating Human Speech with Whisper

  • Human-like Speech Data: Utilize Whisper to generate human-like speech data.

  • Versatile Applications: Ideal for voice recognition, command-and-control systems, or any application requiring natural language processing.

  1. Select Speech Generation: Choose the Whisper option.

  2. Enter Text: Provide the text you want to be converted into speech (e.g., "Hello Edge!").

  3. Generate and Save: Click "Generate" to create the speech data. Review and save the generated audio files.

Eleven Labs Sound Effects models

  • Realistic Sound Effects: Use Eleven Labs to generate realistic sound effects for your projects.

  • Customize Sound Prompts: Define the type of sound you need (e.g., "Glass breaking" or "Car engine revving").

Only available with Edge Impulse Enterprise Plan

Try our FREE Enterprise Trial today.

Transformation blocks can be complex to set up and are one of the most advanced features Edge Impulse provides. Feel free to ask your customer solution engineer for some help and some examples, we have been setting up complex pipelines for our customers and our engineers have acquired a lot of expertise with transformation blocks.

Custom Transformation Blocks

You can also create custom transformation blocks to generate synthetic data using your own models or APIs. This feature allows you to integrate your custom generative models into Edge Impulse Studio for data augmentation.

Follow our Custom Transformation Blocks guide to learn how to create and use custom transformation blocks in Edge Impulse Studio.

x-synthetic-data-job-id header

To handle the new synthetic data ingestion flag, it is necessary to parse an extra argument as can be seen in the DALL-E blocks example below:

parser.add_argument('--synthetic-data-job-id', type=int, required=False, help="If specified, sets the synthetic_data_job_id metadata key")

Then, pass the argument as a header to the ingestion api via the x-synthetic-data-job-id header field:

Pass the argument as a header to ingestion:
            res = requests.post(url=INGESTION_URL + '/api/' + upload_category + '/files',
                headers={
                    'x-label': label,
                    'x-api-key': API_KEY,
                    'x-metadata': json.dumps({
                        'generated_by': 'dall-e-3',
                        'prompt': prompt,
                    }),
                    'x-synthetic-data-job-id': str(args.synthetic_data_job_id) if args.synthetic_data_job_id is not None else None,
                },
                files = { 'data': (os.path.basename(fullpath), png, 'image/png') }
            )

Read on in our DALL-E 3 Image Generation Block guide and repo here.

Summary

To start using the Synthetic Data tab, log in to your Edge Impulse Enterprise account and open a project. Navigate to the "Synthetic Data" tab and explore the new features. If you don't have an account yet, sign up for free at Edge Impulse.

For further assistance, visit our forum or check out our Introduction to Edge AI Course.

Stay tuned for more updates on what we're doing with generative AI. Exciting times ahead!

Last updated