1 of 6

Generate synthetic datasets

Synthetic datasets are managed through our Synthetic Data tab. This tab provides a user-friendly interface to generate synthetic data for your projects. You can create synthetic datasets using a variety of tools and models, such as DALL-E for image generation or Whisper for human speech synthesis.

Synthetic Data Tab Tutorial - Generate Synthetic Datasets.

We have put together the following tutorials to help you get started with synthetic datasets generation:

Generate image datasets using Dall·E.
Generate keyword-spotting datasets.
Generate physics simulation datasets.
Generate audio event datasets.

Synthetic datasets are a collection of data artificially generated rather than being collected from real-world observations or measurements. They are created using algorithms, simulations, or mathematical models to mimic the characteristics and patterns of real data. Synthetic datasets are a valuable tool to generate data for experimentation, testing, and development when obtaining real data is challenging, costly, or undesirable.

You might want to generate synthetic datasets for several reasons:

Cost Efficiency: Creating synthetic data can be more cost-effective and efficient than collecting large volumes of real data, especially in resource-constrained environments.

Data Augmentation: Synthetic datasets allow users to augment their real-world data with variations, which can improve model robustness and performance.

Data Diversity: Synthetic datasets enable the inclusion of uncommon or rare scenarios, enriching model training with a wider range of potential inputs.

Privacy and Security: When dealing with sensitive data, synthetic datasets provide a way to train models without exposing real information, enhancing privacy and security.

Synthetic data tab

The Synthetic Data tab allows you to easily create and manage synthetic data, enhancing your datasets and improving model performance. Whether you need images, speech, or audio data, our new integrations make it simple and efficient.

There is also a video version of this guide Placeholder video TBD:

Supported Blocks

DALL-E Image Generation Block: Generate image datasets using Dall·E using the DALL-E model.
Whisper Keyword Spotting Generation Block: Generate keyword-spotting datasets using the Whisper model. Ideal for keyword spotting and speech recognition applications.

To use these features, navigate to Data Sources, add new data source transformation blocks, set up actions, run a pipeline, and then go to Data Acquisition to view the output. If you want to make changes or refine your prompts, you have to delete the pipeline and start over.

Benefits of Synthetic Data Tab Management

Enhance Your Datasets: Easily augment your datasets with high-quality synthetic data.
Improve Model Accuracy: Synthetic data can help fill gaps in your dataset, leading to better model performance.
Save Time and Resources: Quickly generate the data you need without the hassle of manual data collection.

Data ingestion will also include a flag in the header "synthetic": true, allowing users to pass an optional new header to indicate this is synthetic data. We've updated the Whisper and DALL-E blocks to accommodate this update.

Accessing the Synthetic Data Tab

To access the Synthetic Data tab, follow these steps:

Navigate to Your Project: Open your project in Edge Impulse Studio.
Open Synthetic Data Tab: Click on the "Synthetic Data" tab in the left-hand menu.

Generating Synthetic Images with GPT-4 (DALL-E)

Create Realistic Images: Use DALL-E to generate realistic images for your datasets.
Customize Prompts: Tailor the prompts to generate specific types of images suited to your project needs.

Select Image Generation: Choose the GPT-4 (DALL-E) option.
Enter a Prompt: Describe the type of images you need (e.g., "A photo of a factory worker wearing a hard hat").
Generate and Save: Click "Generate" to create the images. Review and save the generated images to your dataset.

Generating Human Speech with Whisper

Human-like Speech Data: Utilize Whisper to generate human-like speech data.
Versatile Applications: Ideal for voice recognition, command-and-control systems, or any application requiring natural language processing.

Select Speech Generation: Choose the Whisper option.
Enter Text: Provide the text you want to be converted into speech (e.g., "Hello Edge!").
Generate and Save: Click "Generate" to create the speech data. Review and save the generated audio files.

Only available with Edge Impulse Enterprise Plan

Try our FREE Enterprise Trial today.

Transformation blocks can be complex to set up and are one of the most advanced features Edge Impulse provides. Feel free to ask your customer solution engineer for some help and some examples, we have been setting up complex pipelines for our customers and our engineers have acquired a lot of expertise with transformation blocks.

Summary

To start using the Synthetic Data tab, log in to your Edge Impulse Enterprise account and open a project. Navigate to the "Synthetic Data" tab and explore the new features. If you don't have an account yet, sign up for free at Edge Impulse.

For further assistance, visit our forum or check out our Introduction to Edge AI Course.

Stay tuned for more updates on what we're doing with generative AI. Exciting times ahead!

Generate image datasets using Dall·E

This notebook explores how we can use generative AI to create datasets which don't exist yet. This can be a good starting point for your project if you have not collected or cannot collect the data required. It is important to note the limitations of generative AI still apply here, biases can be introduced through your prompts, results can include "hallucinations" and quality control is important.

This example uses the OpenAI API to call the Dall-E image generation tool, it explores both generation and variation but there are other tools such as editing which could also be useful for augmenting an existing dataset.

There is also a video version of this tutorial:

We have wrapped this example into a Transformation Block (Enterprise Feature) to make it even easier to generate images and upload them to your organization. See: https://github.com/edgeimpulse/example-transform-Dall-E-images

Local Software Requirements

Python 3
Pip package manager
Jupyter Notebook: https://jupyter.org/install
pip packages (install with pip installpackagename):
- openai https://pypi.org/project/openai/

! pip install openai

# Imports
import openai
import os
import requests

# Notebook Imports
from IPython.display import Image
from IPython.display import display
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

Set up OpenAI API

First off you will need to set up and Edge Impulse account and create your first project.

You will also need to create an API Key for OpenAI: https://platform.openai.com/docs/api-reference/authentication


# You can set your API key and org as environment variables in your system like this:
# os.environ['OPENAI_API_KEY'] = 'api string'

# Set up OpenAI API key and organization
openai.api_key = os.getenv("OPENAI_API_KEY")

Generate your first image

The API takes in a prompt, number of images and a size

image_prompt = "A webcam image of a human 1m from the camera sitting at a desk showing that they are wearing gloves with their hands up to the camera."
# image_prompt = "A webcam image of a person 1m from the camera sitting at a desk with their bare hands up to the camera."
# image_prompt = "A webcam image of a human 1m from the camera sitting at a desk showing that they are wearing wool gloves with their hands up to the camera."

response = openai.Image.create(
    prompt=image_prompt,
    n=1,
    size="256x256",
)
Image(url=response["data"][0]["url"])

Generate some variations of this image

The API also has a variations call which takes in an existing images and creates variations of it. This could also be used to modify existing images.

response2 = openai.Image.create_variation(
  image=requests.get(response['data'][0]['url']).content,
  n=3,
  size="256x256"
)
imgs = []
for img in response2['data']:
  imgs.append(Image(url=img['url']))

display(*imgs)

Generate a dataset:

Here we are iterate through a number of images and variations to generate a dataset based on the prompts/labels given.

labels = [{"prompt": "A webcam image of a human 1m from the camera sitting at a desk showing that they are wearing wool gloves with their hands up to the camera.",
          "label": "gloves"},
          {"prompt": "A webcam image of a person 1m from the camera sitting at a desk with their bare hands up to the camera.",
          "label": "no-gloves"}
        ]
output_folder = "output"
base_images_number = 10
variation_per_image = 3
# Check if output directory for noisey files exists and create it if it doesn't
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

for option in labels:
    for i in range(base_images_number):
        response = openai.Image.create(
            prompt=option["prompt"],
            n=1,
            size="256x256",
        )
        try:
            img = response["data"][0]["url"]
            with open(f'{output_folder}/{option["label"]}.{img.split("/")[-1]}.png', 'wb+') as f:
                f.write(requests.get(img).content)
            response2 = openai.Image.create_variation(
                image=requests.get(img).content,
                n=variation_per_image,
                size="256x256"
            )
        except Exception as e:
            print(e)
        for img in response2['data']:
            try:
                with open(f'{output_folder}/{option["label"]}.{img["url"].split("/")[-1]}.png', 'wb') as f:
                    f.write(requests.get(img["url"]).content)
            except Exception as e:
                print(e)

Plot all the output images:

import os


# Define the folder containing the images
folder_path = './output'

# Get a list of all the image files in the folder
image_files = [f for f in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, f)) and f.endswith('.png')]

# Set up the plot
fig, axs = plt.subplots(nrows=20, ncols=20, figsize=(10, 10))

# Loop through each image and plot it in a grid cell
for i in range(20):
    for j in range(20):
        img = mpimg.imread(os.path.join(folder_path, image_files[i*10+j]))
        axs[i,j].imshow(img)
        axs[i,j].axis('off')

# Make the plot look clean
fig.subplots_adjust(hspace=0, wspace=0)
plt.tight_layout()
plt.show()

These files can then be uploaded to a project with these commands (run in a separate terminal window):

! cd output
! edge-impulse-uploader .

(run edge-impulse-uploader --clean if you have used the CLI before to reset the target project)

What next?

Now you can use your images to create an image classification model on Edge Impulse.

Why not try some other OpenAI calls, 'edit' could be used to take an existing image and translate it into different environments or add different humans to increase the variety of your dataset. https://platform.openai.com/docs/guides/images/usage

Generate keyword spotting datasets

Local Software Requirements

Python 3
Pip package manager
Jupyter Notebook: https://jupyter.org/install
pip packages (install with pip installpackagename):
- pydub https://pypi.org/project/pydub/
- google-cloud-texttospeech https://cloud.google.com/python/docs/reference/texttospeech/latest
- requests https://pypi.org/project/requests/

# Imports
import os
import json
import time
import io
import random
import requests
from pydub import AudioSegment
from google.cloud import texttospeech

Set up Google TTS API

First off you will need to set up and Edge Impulse account and create your first project. You will also need a Google Cloud account with the Text to Speech API enabled: https://cloud.google.com/text-to-speech, the first million characters generated each month are free (WaveNet voices), this should be plenty for most cases as you'll only need to generate your dataset once. From google you will need to download a credentials JSON file and set it to the correct environment variable on your system to allow the python API to work: (https://developers.google.com/workspace/guides/create-credentials#service-account)


# Insert the path to your service account API key json file here for google cloud
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '../path-to-google-credentials-file.json'

Generate the desired samples

First off we need to set our desired keywords and labels:


# Keyword or short sentence and label (e.g. 'hello world')
keyword = [
    {'string':'edge','label':'edge'},
    {'string':'impulse','label':'impulse'},
           ]

Then we need to set up the parameters for our speech dataset, all possible combinations will be iterated through:

languages - Choose the text to speech voice languages to use (https://cloud.google.com/text-to-speech/docs/voices)
pitches - Which voice pitches to apply
genders - Which SSML genders to apply
speakingRates - Which speaking speeds to apply



# Languages, remove as appropriate
# languages = [
#     'ar-XA', 'bn-IN',  'en-GB',  'fr-CA',
#     'en-US', 'es-ES',  'fi-FI',  'gu-IN',
#     'ja-JP', 'kn-IN',  'ml-IN',  'sv-SE',
#     'ta-IN', 'tr-TR',  'cs-CZ',  'de-DE',
#     'en-AU', 'en-IN',  'fr-FR',  'hi-IN',
#     'id-ID', 'it-IT',  'ko-KR',  'ru-RU',
#     'uk-UA', 'cmn-CN', 'cmn-TW', 'da-DK',
#     'el-GR', 'fil-PH', 'hu-HU',  'nb-NO',
#     'nl-NL', 'pt-PT',  'sk-SK',  'vi-VN',
#     'pl-PL', 'pt-BR',  'ca-ES',  'yue-HK',
#     'af-ZA', 'bg-BG',  'lv-LV',  'ro-RO',
#     'sr-RS', 'th-TH',  'te-IN',  'is-IS'
# ]
languages = [
    'en-GB',
    'en-US',
]
# Pitches to generate (in semitones) range: [-20.0, 20.0]
pitches = [-2, 0, 2]
# Voice genders to use
genders = ["NEUTRAL", "FEMALE", "MALE"]
# Speaking rates to use range: [0.25, 4.0]
speakingRates = [0.9, 1, 1.1]

Then provide some other key parameters:

out_length - How long each output sample should be
count - Maximum number of samples to output (if all combinations of languages, pitches etc are higher then this restricts output)
voice-dir - Where to store the clean samples before noise is added
noise-url - Which noise file to download and apply to your samples
output-folder - The final output location of the noised samples
num-copies - How many different noisy versions of each sample to create
max-noise-level - in Db,

# Out length minimum (default: 1s)
out_length = 1
# Maximum number of keywords to generate
count = 30
# Raw sample output directory
voice_dir = 'out-wav'
# Creative commons background noise from freesound.org:https://freesound.org/people/Astounded/sounds/483561/
noise_url = 'https://cdn.freesound.org/previews/483/483561_10201334-lq.ogg'
output_folder = 'out-noisy'
num_copies = 2  # Number of noisy copies to create for each input sample
max_noise_level = -5  # Maximum noise level to add in dBFS (negative value)

Then we need to check all the output folders are ready


# Check if output directory for noisey files exists and create it if it doesn't
if not os.path.exists(output_folder):
    os.makedirs(output_folder)
# Check if output directory for raw voices exists and create it if it doesn't
if not os.path.exists(voice_dir):
    os.makedirs(voice_dir)

And download the background noise file


# Download background noise file
response = requests.get(noise_url)
response.raise_for_status()
noise_audio = AudioSegment.from_file(io.BytesIO(response.content), format='ogg')

Then we can generate a list of all possible parameter combinations based on the input earlier. If you have set num_copies to be smaller than the number of combinations then these options will be reduced:


# Generate all combinations of parameters
all_opts = []
for p in pitches:
    for g in genders:
        for l in languages:
            for s in speakingRates:
                for kw in keyword:
                    all_opts.append({
                            "pitch": p,
                            "gender": g,
                            "language": l,
                            "speakingRate": s,
                            "text": kw['string'],
                            "label": kw['label']
                        })
if len(all_opts)*num_copies > count:
    selectEvery = len(all_opts)*num_copies // count
    selectNext = 0
    all_opts = all_opts[::selectEvery]
print(f'Generating {len(all_opts)*num_copies} samples')

Finally we iterate though all the options generated, call the Google TTS API to generate the desired sample, and apply noise to it, saving locally with metadata:


# Instantiate list for file label information
downloaded_files = []

# Instantiates a client
client = texttospeech.TextToSpeechClient()

ix = 0
for o in all_opts:
    ix += 1
    # Set the text input to be synthesized
    synthesis_input = texttospeech.SynthesisInput(text=o['text'])
    # Build the voice request
    voice = texttospeech.VoiceSelectionParams(
        language_code=o['language'],
        ssml_gender=o['gender']
    )
    # Select the type of audio file you want returned
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.LINEAR16,
        pitch=o['pitch'],
        speaking_rate=o['speakingRate'],
        sample_rate_hertz=16000
    )
    # Perform the text-to-speech request on the text input with the selected
    # voice parameters and audio file type

    wav_file_name = f"{voice_dir}/{o['label']}.{o['language']}-{o['gender']}-{o['pitch']}-{o['speakingRate']}.tts.wav"

    if not os.path.exists(wav_file_name):
        print(f"[{ix}/{len(all_opts)}] Text-to-speeching...")
        response = client.synthesize_speech(
            input=synthesis_input, voice=voice, audio_config=audio_config
        )
        with open(wav_file_name, "wb") as f:
            f.write(response.audio_content)
        has_hit_api = True
    else:
        print(f'skipping {wav_file_name}')
        has_hit_api = False

    # Load voice sample
    voice_audio = AudioSegment.from_file(wav_file_name)
    # Add silence to match output length with random padding
    difference = (out_length * 1000) - len(voice_audio)
    if difference > 0:
        padding_before = random.randint(0, difference)
        padding_after = difference - padding_before
        voice_audio = AudioSegment.silent(duration=padding_before) +  voice_audio + AudioSegment.silent(duration=padding_after)

    for i in range(num_copies):
        # Save noisy sample to output folder
        output_filename = f"{o['label']}.{o['language']}-{o['gender']}-{o['pitch']}-{o['speakingRate']}_noisy_{i+1}.wav"
        output_path = os.path.join(output_folder, output_filename)
        if not os.path.exists(output_path):
            # Select random section of noise and random noise level
            start_time = random.randint(0, len(noise_audio) - len(voice_audio))
            end_time = start_time +len(voice_audio)
            noise_level = random.uniform(max_noise_level, 0)

            # Extract selected section of noise and adjust volume
            noise_segment = noise_audio[start_time:end_time]
            noise_segment = noise_segment - abs(noise_level)

            # Mix voice sample with noise segment
            mixed_audio = voice_audio.overlay(noise_segment)
            # Save mixed audio to file
            mixed_audio.export(output_path, format='wav')

            print(f'Saved mixed audio to {output_path}')
        else:
            print(f'skipping {output_path}')
        # Save metadata for file
        downloaded_files.append({
            "path": str(output_filename),
            "label": o['label'],
            "category": "split",
            "metadata": {
                "pitch": str(['pitch']),
                "gender": str(o['gender']),
                "language": str(o['language']),
                "speakingRate": str(o['speakingRate']),
                "text": o['text'],
                "imported_from": "Google Cloud TTS"
            }
        })

    if has_hit_api:
        time.sleep(0.5)

print("Done text-to-speeching")
print("")

input_file = os.path.join(output_folder, 'input.json')
info_file = {
    "version": 1,
    "files": downloaded_files
}
# Output the metadata file
with open(input_file, "w") as f:
    json.dump(info_file, f)

# Move to the out-noisy folder
! cd out-noisy
# Upload all files in the out-noisy folder with metadata attached in the input.json file
! edge-impulse-uploader --info-file input.json *

What next?

Now you can use your keywords to create a robust keyword detection model in Edge Impulse Studio!

Try out both classification models and the transfer learning keyword spotting model to see which works best for your case

Generate physics simulation datasets

This notebook takes you through a basic example of using the physics simulation tool PyBullet to generate an accelerometer dataset representing dropping the Nordic Thingy:53 devkit from different heights. This dataset can be used to train a regression model to predict drop height.

This idea could be used for a wide range of simulatable environments- for example generating accelerometer datasets for pose estimation or fall detection. The same concept could be applied in an FMEA application for generating strain datasets for structural monitoring.

There is also a video version of this tutorial:

Local Software Requirements

Python 3
Pip package manager
Jupyter Notebook: https://jupyter.org/install
Bullet3: https://github.com/bulletphysics/bullet3

The dependencies can be installed with:

! pip install pybullet numpy

# Imports
import pybullet as p
import pybullet_data
import os
import shutil
import csv
import random
import numpy as np
import json

Create object to simulate

We need to load in a Universal Robotics Description Format file describing an object with the dimensions and weight of a Nordic Thingy:53. In this case, measuring our device it is 64x60x23.5mm and its weight 60g. The shape is given by a .obj 3D model file.

	<visual>
		<origin xyz="0.02977180615936878 -0.01182944632717566 0.03176079914341195" rpy="1.57079632679 0.0 0.0" />
		<geometry>
			<mesh filename="thingy53/thingy53 v2.obj" scale="1 1 1" />
		</geometry>
		<material name="texture">
			<color rgba="1.0 1.0 1.0 1.0" />
		</material>
	</visual>
	<collision>
		<origin xyz="0.02977180615936878 -0.01182944632717566 0.03176079914341195" rpy="1.57079632679 0.0 0.0" />
		<geometry>
			<mesh filename="thingy53/thingy53 v2.obj" scale="1 1 1" />
		</geometry>
	</collision>
	<inertial>
		<mass value="0.06" />
  		<inertia ixx="0.00002076125" ixy="0" ixz="0" iyy="0.00002324125" iyz="0" izz="0.00003848"/>
	</inertial>
</link>

Visualizing the problem

To generate the required data we will be running PyBullet in headless "DIRECT" mode so we can iterate quickly over the parameter field. If you run the python file below you can see how pybullet simulates the object dropping onto a plane

! python ../.assets/pybullet/single_simulation.py

Setting up the simulation environment

First off we need to set up a pybullet physics simulation environment. We load in our object file and a plane for it to drop onto. The plane's dynamics can be adjusted to better represent the real world (in this case we're dropping onto carpet)

# Set up PyBullet physics simulation (change from p.GUI to p.DIRECT for headless simulation)
physicsClient = p.connect(p.DIRECT)
p.setAdditionalSearchPath(pybullet_data.getDataPath())
p.setGravity(0, 0, -9.81)

# Load object URDF file
obj_file = "../.assets/pybullet/thingy53/thingy53.urdf"
obj_id = p.loadURDF(obj_file, flags=p.URDF_USE_INERTIA_FROM_FILE)

# Add a solid plane for the object to collide with
plane_id = p.loadURDF("plane.urdf")

# Set length of simulation and sampling frequency
sample_length = 2 # Seconds
sample_freq = 100 # Hz

We also need to define the output folder for our simulated accelerometer files

output_folder = 'output/'
# Check if output directory for noisey files exists and create it if it doesn't
if not os.path.exists(output_folder):
    os.makedirs(output_folder)
else:
    shutil.rmtree(output_folder)
    os.makedirs(output_folder)

And define the drop parameters

# Simulate dropping object from range of heights
heights = 100
sims_per_height = 20
min_height = 0.1 # Metres
max_height = 0.8 # Metres

We also need to define the characteristics of the IMU on the real device we are trying to simulate. In this case the Nordic Thingy:53 has a Bosch BMI270 IMU (https://www.bosch-sensortec.com/products/motion-sensors/imus/bmi270/) which is set to a range of +-2g with a resolution of 0.06g. These parameters will be used to restrict the raw acceleration output:


range_g = 2
range_acc = range_g * 9.81
resolution_mg = 0.06
resolution_acc = resolution_mg / 1000.0 * 9.81

Finally we are going to give the object and plane restitution properties to allow for some bounce. In this case I dropped the real Thingy:53 onto a hardwood table. You can use p.changeDynamics to introduce other factors such as damping and friction.

p.changeDynamics(obj_id, -1, restitution=0.3)
p.changeDynamics(plane_id, -1, restitution=0.4)

Drop simulation

Here we iterate over a range of heights, randomly changing its start orientation for i number of simulations per height. The acceleration is calculated relative to the orientation of the Thingy:53 object to represent its onboard accelerometer.

metadata = []
for height in np.linspace(max_height, min_height, num=heights):
    print(f"Simulating {sims_per_height} drops from {height}m")
    for i in range(sims_per_height):
        # Set initial position and orientation of object
        x = 0
        y = 0
        z = height
        orientation = p.getQuaternionFromEuler((random.uniform(0, 2 * np.pi), random.uniform(0, 2 * np.pi), random.uniform(0, 2 * np.pi)))
        p.resetBasePositionAndOrientation(obj_id, [x, y, z], orientation)
        
        prev_linear_vel = np.zeros(3)

        # Initialize the object position and velocity
        pos_prev, orn_prev = p.getBasePositionAndOrientation(obj_id)
        vel_prev, ang_vel_prev = p.getBaseVelocity(obj_id)
        timestamp=0
        dt=1/sample_freq
        p.setTimeStep(dt)
        filename=f"drop_{height}m_{i}.csv"
        with open(f"output/{filename}", mode="w") as csv_file:
                writer = csv.writer(csv_file)
                writer.writerow(['timestamp','accX','accY','accZ'])
        while timestamp < sample_length:
            p.stepSimulation()
            linear_vel, angular_vel = p.getBaseVelocity(obj_id)
            lin_acc = [(v - prev_v)/dt for v, prev_v in zip(linear_vel, prev_linear_vel)]
            prev_linear_vel = linear_vel
            timestamp += dt
            # Get the current position and orientation of the object
            pos, orn = p.getBasePositionAndOrientation(obj_id)

            # Get the linear and angular velocity of the object in world coordinates
            vel, ang_vel = p.getBaseVelocity(obj_id)

             # Calculate the change in position and velocity between steps
            pos_diff = np.array(pos) - np.array(pos_prev)
            vel_diff = np.array(vel) - np.array(vel_prev)

            # Convert the orientation quaternion to a rotation matrix
            rot_matrix = np.array(p.getMatrixFromQuaternion(orn)).reshape(3, 3)

            # Calculate the local linear acceleration of the object, subtracting gravity
            local_acc = np.dot(rot_matrix.T, vel_diff / dt) - np.array([0, 0, -9.81])
            # Restrict the acceleration to the range of the accelerometer
            imu_rel_lin_acc_scaled = np.clip(local_acc, -range_acc, range_acc)
            # Round the acceleration to the nearest resolution of the accelerometer
            imu_rel_lin_acc_rounded = np.round(imu_rel_lin_acc_scaled/resolution_acc) * resolution_acc
            # Update the previous position and velocity
            pos_prev, orn_prev = pos, orn
            vel_prev, ang_vel_prev = vel, ang_vel

            # Save acceleration data to CSV file
            with open(f"{output_folder}{filename}", mode="a") as csv_file:
                writer = csv.writer(csv_file)
                writer.writerow([timestamp*1000] + imu_rel_lin_acc_rounded.tolist())

        nearestheight = round(height, 2)
        metadata.append({
            "path": filename,
            "category": "training",
            "label": { "type": "label", "label": str(nearestheight)}
        })

Finally we save the metadata file to the output folder. This can be used to tell the edge-impulse-uploader CLI tool the floating point labels for each file.

jsonout = {"version": 1, "files": metadata}

with open(f"{output_folder}/files.json", "w") as f:
    json.dump(jsonout, f)

# Disconnect from PyBullet physics simulation
p.disconnect()

These files can then be uploaded to a project with these commands (run in a separate terminal window):

! cd output
! edge-impulse-uploader --info-file files.json

(run edge-impulse-uploader --clean if you have used the CLI before to reset the target project)

What next?

Now you can use your dataset a drop height detection regression model in Edge Impulse Studio!

See if you can edit this project to simulate throwing the object up in the air to predict the maximum height, or add in your own custom object. You could also try to better model the real environment you're dropping the object in- adding air resistance, friction, damping and material properties for your surface.

Generate audio event datasets

Generate audio data using the Eleven Labs block, specifically for non-human voice audio. In this guide and video we'll be exploring an innovative approach leveraging generative AI to train small edge AI models. This is particularly useful in scenarios where high-quality, diverse training data is scarce or expensive to collect.

There is also a video version of this guide:

Prerequisites

Eleven Labs: account and API Key

Only available with Edge Impulse Enterprise Plan

Try our FREE Enterprise Trial today.

Introduction

Edge AI enables smart devices to perform machine learning tasks right at the source of the data collection. These small models are very good and optimized for performance scope tasks, as long as they are trained on quality data. We usually say "garbage in, garbage out."

Data Quality and Diversity

To help us improve the quality of our sound datasets, we've been working on an integration between Edge Impulse and ElevenLabs.io. Edge Impulse excels in the creation and optimization of Edge AI models, while Eleven Labs offers advanced capabilities to create realistic sound effects. This integration allows us to expand our datasets with sounds that may be difficult or expensive to record naturally. This approach not only saves time and money but also enhances the accuracy and reliability of the models we deploy on edge devices.

Practical Application: Glass Breaking Sound

In the above demonstration, we focus on a practical application that can be used in a smart security system, or in a factory to detect incidents, such as detecting the sounds of glass breaking.

Getting Started

Steps to Generate Sound Samples with ElevenLabs.io

Navigate to Data Acquisition: Once you're in your project, navigate to the Data Acquisition section and go to Data Sources. Under Data Sources, you can add a new one and select the transformation block for sound generation using generative AI.

Step 1: Generate a Sound Sample with ElevenLabs.io

First, get your Eleven Labs API Key. Navigate to the Eleven Labs web interface to get your key and test your prompt.

Step 2: Define the Prompt

Here we will be trying to collect a glass breaking sound or impact. Navigate to the settings to define the prompt "glass breaking" and define the length (e.g., 2 seconds) and prompt influence

Step 3: Run the Pipeline:

Once you've set up your prompt, and api key, run the pipeline to generate the sound samples. You can then view the output in the Data Acquisition section.

Benefits of Using Generative AI for Sound Generation

Enhance Data Quality: Generative AI can create high-quality sound samples that are difficult to record naturally.
Increase Dataset Diversity: Access a wide range of sounds to enrich your training dataset and improve model performance.
Save Time and Resources: Quickly generate the sound samples you need without the hassle of manual recording.
Improve Model Accuracy: High-quality, diverse sound samples can help fill gaps in your dataset and enhance model performance.

Conclusion

By leveraging generative AI for sound generation, you can enhance the quality and diversity of your training datasets, leading to more accurate and reliable edge AI models. This innovative approach saves time and resources while improving the performance of your models in real-world applications. Try out the Eleven Labs block in Edge Impulse today and start creating high-quality sound datasets for your projects.

Generate physics simulation datasets

There is also a video version of this tutorial:

Local Software Requirements

Python 3
Pip package manager
Jupyter Notebook: https://jupyter.org/install
Bullet3: https://github.com/bulletphysics/bullet3

The dependencies can be installed with:

! pip install pybullet numpy

# Imports
import pybullet as p
import pybullet_data
import os
import shutil
import csv
import random
import numpy as np
import json

Create object to simulate

	<visual>
		<origin xyz="0.02977180615936878 -0.01182944632717566 0.03176079914341195" rpy="1.57079632679 0.0 0.0" />
		<geometry>
			<mesh filename="thingy53/thingy53 v2.obj" scale="1 1 1" />
		</geometry>
		<material name="texture">
			<color rgba="1.0 1.0 1.0 1.0" />
		</material>
	</visual>
	<collision>
		<origin xyz="0.02977180615936878 -0.01182944632717566 0.03176079914341195" rpy="1.57079632679 0.0 0.0" />
		<geometry>
			<mesh filename="thingy53/thingy53 v2.obj" scale="1 1 1" />
		</geometry>
	</collision>
	<inertial>
		<mass value="0.06" />
  		<inertia ixx="0.00002076125" ixy="0" ixz="0" iyy="0.00002324125" iyz="0" izz="0.00003848"/>
	</inertial>
</link>

Visualizing the problem

! python ../.assets/pybullet/single_simulation.py

Setting up the simulation environment

# Set up PyBullet physics simulation (change from p.GUI to p.DIRECT for headless simulation)
physicsClient = p.connect(p.DIRECT)
p.setAdditionalSearchPath(pybullet_data.getDataPath())
p.setGravity(0, 0, -9.81)

# Load object URDF file
obj_file = "../.assets/pybullet/thingy53/thingy53.urdf"
obj_id = p.loadURDF(obj_file, flags=p.URDF_USE_INERTIA_FROM_FILE)

# Add a solid plane for the object to collide with
plane_id = p.loadURDF("plane.urdf")

# Set length of simulation and sampling frequency
sample_length = 2 # Seconds
sample_freq = 100 # Hz

We also need to define the output folder for our simulated accelerometer files

output_folder = 'output/'
# Check if output directory for noisey files exists and create it if it doesn't
if not os.path.exists(output_folder):
    os.makedirs(output_folder)
else:
    shutil.rmtree(output_folder)
    os.makedirs(output_folder)

And define the drop parameters

# Simulate dropping object from range of heights
heights = 100
sims_per_height = 20
min_height = 0.1 # Metres
max_height = 0.8 # Metres


range_g = 2
range_acc = range_g * 9.81
resolution_mg = 0.06
resolution_acc = resolution_mg / 1000.0 * 9.81

p.changeDynamics(obj_id, -1, restitution=0.3)
p.changeDynamics(plane_id, -1, restitution=0.4)

Drop simulation

metadata = []
for height in np.linspace(max_height, min_height, num=heights):
    print(f"Simulating {sims_per_height} drops from {height}m")
    for i in range(sims_per_height):
        # Set initial position and orientation of object
        x = 0
        y = 0
        z = height
        orientation = p.getQuaternionFromEuler((random.uniform(0, 2 * np.pi), random.uniform(0, 2 * np.pi), random.uniform(0, 2 * np.pi)))
        p.resetBasePositionAndOrientation(obj_id, [x, y, z], orientation)
        
        prev_linear_vel = np.zeros(3)

        # Initialize the object position and velocity
        pos_prev, orn_prev = p.getBasePositionAndOrientation(obj_id)
        vel_prev, ang_vel_prev = p.getBaseVelocity(obj_id)
        timestamp=0
        dt=1/sample_freq
        p.setTimeStep(dt)
        filename=f"drop_{height}m_{i}.csv"
        with open(f"output/{filename}", mode="w") as csv_file:
                writer = csv.writer(csv_file)
                writer.writerow(['timestamp','accX','accY','accZ'])
        while timestamp < sample_length:
            p.stepSimulation()
            linear_vel, angular_vel = p.getBaseVelocity(obj_id)
            lin_acc = [(v - prev_v)/dt for v, prev_v in zip(linear_vel, prev_linear_vel)]
            prev_linear_vel = linear_vel
            timestamp += dt
            # Get the current position and orientation of the object
            pos, orn = p.getBasePositionAndOrientation(obj_id)

            # Get the linear and angular velocity of the object in world coordinates
            vel, ang_vel = p.getBaseVelocity(obj_id)

             # Calculate the change in position and velocity between steps
            pos_diff = np.array(pos) - np.array(pos_prev)
            vel_diff = np.array(vel) - np.array(vel_prev)

            # Convert the orientation quaternion to a rotation matrix
            rot_matrix = np.array(p.getMatrixFromQuaternion(orn)).reshape(3, 3)

            # Calculate the local linear acceleration of the object, subtracting gravity
            local_acc = np.dot(rot_matrix.T, vel_diff / dt) - np.array([0, 0, -9.81])
            # Restrict the acceleration to the range of the accelerometer
            imu_rel_lin_acc_scaled = np.clip(local_acc, -range_acc, range_acc)
            # Round the acceleration to the nearest resolution of the accelerometer
            imu_rel_lin_acc_rounded = np.round(imu_rel_lin_acc_scaled/resolution_acc) * resolution_acc
            # Update the previous position and velocity
            pos_prev, orn_prev = pos, orn
            vel_prev, ang_vel_prev = vel, ang_vel

            # Save acceleration data to CSV file
            with open(f"{output_folder}{filename}", mode="a") as csv_file:
                writer = csv.writer(csv_file)
                writer.writerow([timestamp*1000] + imu_rel_lin_acc_rounded.tolist())

        nearestheight = round(height, 2)
        metadata.append({
            "path": filename,
            "category": "training",
            "label": { "type": "label", "label": str(nearestheight)}
        })

Finally we save the metadata file to the output folder. This can be used to tell the edge-impulse-uploader CLI tool the floating point labels for each file.

jsonout = {"version": 1, "files": metadata}

with open(f"{output_folder}/files.json", "w") as f:
    json.dump(jsonout, f)

# Disconnect from PyBullet physics simulation
p.disconnect()

These files can then be uploaded to a project with these commands (run in a separate terminal window):

! cd output
! edge-impulse-uploader --info-file files.json

(run edge-impulse-uploader --clean if you have used the CLI before to reset the target project)

What next?

Now you can use your dataset a drop height detection regression model in Edge Impulse Studio!

Generate keyword spotting datasets

Local Software Requirements

Python 3
Pip package manager
Jupyter Notebook: https://jupyter.org/install
pip packages (install with pip installpackagename):
- pydub https://pypi.org/project/pydub/
- google-cloud-texttospeech https://cloud.google.com/python/docs/reference/texttospeech/latest
- requests https://pypi.org/project/requests/

# Imports
import os
import json
import time
import io
import random
import requests
from pydub import AudioSegment
from google.cloud import texttospeech

Set up Google TTS API


# Insert the path to your service account API key json file here for google cloud
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '../path-to-google-credentials-file.json'

Generate the desired samples

First off we need to set our desired keywords and labels:


# Keyword or short sentence and label (e.g. 'hello world')
keyword = [
    {'string':'edge','label':'edge'},
    {'string':'impulse','label':'impulse'},
           ]

Then we need to set up the parameters for our speech dataset, all possible combinations will be iterated through:

languages - Choose the text to speech voice languages to use (https://cloud.google.com/text-to-speech/docs/voices)
pitches - Which voice pitches to apply
genders - Which SSML genders to apply
speakingRates - Which speaking speeds to apply



# Languages, remove as appropriate
# languages = [
#     'ar-XA', 'bn-IN',  'en-GB',  'fr-CA',
#     'en-US', 'es-ES',  'fi-FI',  'gu-IN',
#     'ja-JP', 'kn-IN',  'ml-IN',  'sv-SE',
#     'ta-IN', 'tr-TR',  'cs-CZ',  'de-DE',
#     'en-AU', 'en-IN',  'fr-FR',  'hi-IN',
#     'id-ID', 'it-IT',  'ko-KR',  'ru-RU',
#     'uk-UA', 'cmn-CN', 'cmn-TW', 'da-DK',
#     'el-GR', 'fil-PH', 'hu-HU',  'nb-NO',
#     'nl-NL', 'pt-PT',  'sk-SK',  'vi-VN',
#     'pl-PL', 'pt-BR',  'ca-ES',  'yue-HK',
#     'af-ZA', 'bg-BG',  'lv-LV',  'ro-RO',
#     'sr-RS', 'th-TH',  'te-IN',  'is-IS'
# ]
languages = [
    'en-GB',
    'en-US',
]
# Pitches to generate (in semitones) range: [-20.0, 20.0]
pitches = [-2, 0, 2]
# Voice genders to use
genders = ["NEUTRAL", "FEMALE", "MALE"]
# Speaking rates to use range: [0.25, 4.0]
speakingRates = [0.9, 1, 1.1]

Then provide some other key parameters:

out_length - How long each output sample should be
count - Maximum number of samples to output (if all combinations of languages, pitches etc are higher then this restricts output)
voice-dir - Where to store the clean samples before noise is added
noise-url - Which noise file to download and apply to your samples
output-folder - The final output location of the noised samples
num-copies - How many different noisy versions of each sample to create
max-noise-level - in Db,

# Out length minimum (default: 1s)
out_length = 1
# Maximum number of keywords to generate
count = 30
# Raw sample output directory
voice_dir = 'out-wav'
# Creative commons background noise from freesound.org:https://freesound.org/people/Astounded/sounds/483561/
noise_url = 'https://cdn.freesound.org/previews/483/483561_10201334-lq.ogg'
output_folder = 'out-noisy'
num_copies = 2  # Number of noisy copies to create for each input sample
max_noise_level = -5  # Maximum noise level to add in dBFS (negative value)

Then we need to check all the output folders are ready


# Check if output directory for noisey files exists and create it if it doesn't
if not os.path.exists(output_folder):
    os.makedirs(output_folder)
# Check if output directory for raw voices exists and create it if it doesn't
if not os.path.exists(voice_dir):
    os.makedirs(voice_dir)

And download the background noise file


# Download background noise file
response = requests.get(noise_url)
response.raise_for_status()
noise_audio = AudioSegment.from_file(io.BytesIO(response.content), format='ogg')


# Generate all combinations of parameters
all_opts = []
for p in pitches:
    for g in genders:
        for l in languages:
            for s in speakingRates:
                for kw in keyword:
                    all_opts.append({
                            "pitch": p,
                            "gender": g,
                            "language": l,
                            "speakingRate": s,
                            "text": kw['string'],
                            "label": kw['label']
                        })
if len(all_opts)*num_copies > count:
    selectEvery = len(all_opts)*num_copies // count
    selectNext = 0
    all_opts = all_opts[::selectEvery]
print(f'Generating {len(all_opts)*num_copies} samples')

Finally we iterate though all the options generated, call the Google TTS API to generate the desired sample, and apply noise to it, saving locally with metadata:


# Instantiate list for file label information
downloaded_files = []

# Instantiates a client
client = texttospeech.TextToSpeechClient()

ix = 0
for o in all_opts:
    ix += 1
    # Set the text input to be synthesized
    synthesis_input = texttospeech.SynthesisInput(text=o['text'])
    # Build the voice request
    voice = texttospeech.VoiceSelectionParams(
        language_code=o['language'],
        ssml_gender=o['gender']
    )
    # Select the type of audio file you want returned
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.LINEAR16,
        pitch=o['pitch'],
        speaking_rate=o['speakingRate'],
        sample_rate_hertz=16000
    )
    # Perform the text-to-speech request on the text input with the selected
    # voice parameters and audio file type

    wav_file_name = f"{voice_dir}/{o['label']}.{o['language']}-{o['gender']}-{o['pitch']}-{o['speakingRate']}.tts.wav"

    if not os.path.exists(wav_file_name):
        print(f"[{ix}/{len(all_opts)}] Text-to-speeching...")
        response = client.synthesize_speech(
            input=synthesis_input, voice=voice, audio_config=audio_config
        )
        with open(wav_file_name, "wb") as f:
            f.write(response.audio_content)
        has_hit_api = True
    else:
        print(f'skipping {wav_file_name}')
        has_hit_api = False

    # Load voice sample
    voice_audio = AudioSegment.from_file(wav_file_name)
    # Add silence to match output length with random padding
    difference = (out_length * 1000) - len(voice_audio)
    if difference > 0:
        padding_before = random.randint(0, difference)
        padding_after = difference - padding_before
        voice_audio = AudioSegment.silent(duration=padding_before) +  voice_audio + AudioSegment.silent(duration=padding_after)

    for i in range(num_copies):
        # Save noisy sample to output folder
        output_filename = f"{o['label']}.{o['language']}-{o['gender']}-{o['pitch']}-{o['speakingRate']}_noisy_{i+1}.wav"
        output_path = os.path.join(output_folder, output_filename)
        if not os.path.exists(output_path):
            # Select random section of noise and random noise level
            start_time = random.randint(0, len(noise_audio) - len(voice_audio))
            end_time = start_time +len(voice_audio)
            noise_level = random.uniform(max_noise_level, 0)

            # Extract selected section of noise and adjust volume
            noise_segment = noise_audio[start_time:end_time]
            noise_segment = noise_segment - abs(noise_level)

            # Mix voice sample with noise segment
            mixed_audio = voice_audio.overlay(noise_segment)
            # Save mixed audio to file
            mixed_audio.export(output_path, format='wav')

            print(f'Saved mixed audio to {output_path}')
        else:
            print(f'skipping {output_path}')
        # Save metadata for file
        downloaded_files.append({
            "path": str(output_filename),
            "label": o['label'],
            "category": "split",
            "metadata": {
                "pitch": str(['pitch']),
                "gender": str(o['gender']),
                "language": str(o['language']),
                "speakingRate": str(o['speakingRate']),
                "text": o['text'],
                "imported_from": "Google Cloud TTS"
            }
        })

    if has_hit_api:
        time.sleep(0.5)

print("Done text-to-speeching")
print("")

input_file = os.path.join(output_folder, 'input.json')
info_file = {
    "version": 1,
    "files": downloaded_files
}
# Output the metadata file
with open(input_file, "w") as f:
    json.dump(info_file, f)

The files in ./out-noisy can be uploaded easily using the :

# Move to the out-noisy folder
! cd out-noisy
# Upload all files in the out-noisy folder with metadata attached in the input.json file
! edge-impulse-uploader --info-file input.json *

What next?

Now you can use your keywords to create a robust keyword detection model in Edge Impulse Studio!

Make use of our pre-built keyword dataset to add noise and 'unknown' words to your model:

Try out both classification models and the transfer learning keyword spotting model to see which works best for your case