1 of 15

ML & data engineering

In this ML & data engineering section, you will discover useful techniques to train your models, generate synthetic datasets or to perform advanced feature extraction:

The following tutorials detail how to work with synthetic datasets in Edge Impulse:

Learn about how to integrate synthetic data models into your Edge Impulse project with the following guide:

Synthetic data integration.

EI Python SDK

While the Edge Impulse Studio is a great interface for guiding you through the process of collecting data and training a model, the edgeimpulse Python SDK allows you to programmatically Bring Your Own Model (BYOM), developed and trained on any platform. See here for the Python SDK API reference documentation.

With the following tutorials, you will learn how to use the Edge Impulse Python SDK with a number of other machine-learning frameworks and platforms:

Using the Edge Impulse Python SDK with TensorFlow and Keras
Using the Edge Impulse Python SDK with Hugging Face
Using the Edge Impulse Python SDK with Weights & Biases
Using the Edge Impulse Python SDK with SageMaker Studio
Using the Edge Impulse Python SDK to upload and download data

Using the Edge Impulse Python SDK with TensorFlow and Keras

TensorFlow is an open source library for training machine learning models. Keras is an open source Python library that makes creating neural networks in TensorFlow much easier. We use these two libraries together to very quickly train a model to identify handwritten digits. From there, we use the Edge Impulse Python SDK library to profile the model to see how inference will perform on a target edge device. Then, we use the SDK again to convert our trained model to a C++ library that can be deployed to an edge hardware platform, such as a microcontroller.

Follow the code below to see how to train a simple machine learning model and deploy it to a C++ library using Edge Impulse.

To learn more about using the Python SDK, please see: Edge Impulse Python SDK Overview.

# If you have not done so already, install the following dependencies
!python -m pip install tensorflow==2.12.0 edgeimpulse

from tensorflow import keras
import edgeimpulse as ei

You will need to obtain an API key from an Edge Impulse project. Log into edgeimpulse.com and create a new project. Open the project, navigate to Dashboard and click on the Keys tab to view your API keys. Double-click on the API key to highlight it, right-click, and select Copy.

Note that you do not actually need to use the project in the Edge Impulse Studio. We just need the API Key.

Paste that API key string in the ei.API_KEY value in the following cell:

# Settings
ei.API_KEY = "ei_dae2..." # Change this to your Edge Impulse API key
labels = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
num_classes = len(labels)
deploy_filename = "my_model_cpp.zip"

Train a machine learning model

We want to create a classifier that can uniquely identify handwritten digits. To start, we will use TensorFlow and Keras to train a very simple convolutional neural network (CNN) on the classic MNIST dataset, which consists of handwritten digits from 0 to 9.

# Load MNIST data
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = keras.utils.normalize(x_train, axis=1)
x_test = keras.utils.normalize(x_test, axis=1)
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
input_shape = x_train[0].shape

# Build the model 
model = keras.Sequential([
    keras.layers.Flatten(),
    keras.layers.Dense(32, activation='relu', input_shape=input_shape),
    keras.layers.Dense(num_classes, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x_train, 
          y_train, 
          epochs=5)

# Evaluate model on test set
score = model.evaluate(x_test, y_test, verbose=0)
print(f"Test loss: {score[0]}")
print(f"Test accuracy: {score[1]}")

Profile your model

To start, we need to list the possible target devices we can use for profiling. We need to pick from this list.

# List the available profile target devices
ei.model.list_profile_devices()

You should see a list printed such as:

['alif-he',
 'alif-hp',
 'arduino-nano-33-ble',
 'arduino-nicla-vision',
 'portenta-h7',
 'brainchip-akd1000',
 'cortex-m4f-80mhz',
 'cortex-m7-216mhz',
 ...
 'ti-tda4vm']

A common option is the cortex-m4f-80mhz, as this is a relatively low-power microcontroller family. From there, we can use the Edge Impulse Python SDK to generate a profile for your model to ensure it fits on your target hardware and meets your timing requirements.

# Estimate the RAM, ROM, and inference time for our model on the target hardware family
try:
    profile = ei.model.profile(model=model,
                               device='cortex-m4f-80mhz')
    print(profile.summary())
except Exception as e:
    print(f"Could not profile: {e}")

Deploy your model

Once you are happy with the performance of the model, you can deploy it to a number of possible hardware targets. To see the available hardware targets, run the following:

# List the available profile target devices
ei.model.list_deployment_targets()

You should see a list printed such as:

['zip',
 'arduino',
 'tinkergen',
 'cubemx',
 'wasm',
 ...
 'runner-linux-aarch64-tda4vm']

The most generic target is to download a .zip file that holds a C++ library containing the inference runtime and your trained model, so we choose 'zip' from the above list. To do that, we first need to create a Classification object which contains our label strings (and other optional information about the model). These strings will be added to the C++ library metadata so you can access them in your edge application.

Note that instead of writing the raw bytes to a file, you can also specify an output_directory argument in the .deploy() function. Your deployment file(s) will be downloaded to that directory.

Important! The deployment targets list will change depending on the values provided for model, model_output_type, and model_input_type in the next part. For example, you will not see openmv listed once you upload a model (e.g. using .profile() or .deploy()) if model_input_type is not set to ei.model.input_type.ImageInput(). If you attempt to deploy to an unavailable target, you will receive the error Could not deploy: deploy_target: .... If model_input_type is not provided, it will default to OtherInput. See this page for more information about input types.

# Set model information, such as your list of labels
model_output_type = ei.model.output_type.Classification(labels=labels)

# Set model input type
model_input_type = ei.model.input_type.OtherInput()

# Create C++ library with trained model
deploy_bytes = None
try:
    
    deploy_bytes = ei.model.deploy(model=model,
                                   model_output_type=model_output_type,
                                   model_input_type=model_input_type,
                                   deploy_target='zip')
except Exception as e:
    print(f"Could not deploy: {e}")
    
# Write the downloaded raw bytes to a file
if deploy_bytes:
    with open(deploy_filename, 'wb') as f:
        f.write(deploy_bytes.getvalue())

Your model C++ library should be downloaded as the file my_model_cpp.zip in the same directory as this notebook. You are now ready to use your C++ model in your embedded and edge device application! To use the C++ model for local inference, see our documentation here.

Using the Edge Impulse Python SDK to run EON Tuner

The is Edge Impulse's automated machine learning (AutoML) tool to help you find the best combination of blocks and hyperparameters for your model and within your hardware constraints. This example will walk you through uploading data, running the EON Tuner, and interpreting the results.

WARNING: This notebook will add and delete data in your Edge Impulse project, so be careful! We recommend creating a throwaway project when testing this notebook.

To start, create a new project in Edge Impulse. Do not add any data to it.

# If you have not done so already, install the following dependencies
# !python -m pip install matplotlib pandas edgeimpulse

import edgeimpulse as ei
from edgeimpulse.experimental.data import (
    upload_directory
)
from edgeimpulse.experimental.tuner import (
    check_tuner,
    set_impulse_from_trial,
    start_tuner,
    start_custom_tuner,
    tuner_report_as_df,
)
from edgeimpulse.experimental.impulse import (
    build,
)

Note that you do not actually need to use the project in the Edge Impulse Studio. We just need the API Key.

Paste that API key string in the ei.API_KEY value in the following cell:

# Settings
ei.API_KEY = "ei_dae2..." # Change this to your Edge Impulse API key
deploy_filename = "my_model_cpp.zip"

# Get the project ID
api = ei.experimental.api.EdgeImpulseApi()
project_id = api.default_project_id()

Upload dataset

# Download and unzip gesture dataset
!mkdir -p dataset/
!wget -P dataset -q https://cdn.edgeimpulse.com/datasets/gestures.zip
!unzip -q dataset/gestures.zip -d dataset/gestures/

# Upload training dataset
resp = upload_directory(
    directory="dataset/gestures/training",
    category="training",
)
print(f"Uploaded {len(resp.successes)} training samples")

# Upload test dataset
resp = upload_directory(
    directory="dataset/gestures/testing",
    category="testing",
)
print(f"Uploaded {len(resp.successes)} testing samples")

# Uncomment the following if you want to delete the temporary dataset folder
#!rm -rf dataset/

Run the Tuner

To start, we need to list the possible target devices we can use for profiling. We need to pick from this list.

# List the available profile targets
ei.model.list_profile_devices()

You should see a list printed such as:

['alif-he',
 'alif-hp',
 'arduino-nano-33-ble',
 'arduino-nicla-vision',
 'portenta-h7',
 'brainchip-akd1000',
 'cortex-m4f-80mhz',
 'cortex-m7-216mhz',
 ...
 'ti-tda4vm']

From there, we start the tuner with start_tuner() and wait for completion via check_tuner(). In this example, we configure the tuner to target for the cortex-m4f-80mhz device. Since we want to classify the motion, we choose classification for our classifcation_type and our dataset as motion continuous. We constrain our model to a latency of 100ms for running the impulse.

NOTE: We set the max trials to 3 here. In a real life situation, you will omit this so the tuner decides the best number of trials.

Once the tuner is done, you can print out the results to determine the best combination of blocks and hyperparameters.

# Choose a device from the list
target_device = "cortex-m4f-80mhz"

# Start tuner. This will take 15+ minutes.
start_tuner(
    target_device=target_device,
    classification_type="classification",
    dataset_category="motion_continuous",
    target_latency=100,
    tuning_max_trials=3,
)

# Wait while checking the tuner's progress.
state = check_tuner(
    wait_for_completion=True
)

Print EON Tuner results

To visualize the results of the tuner trials, you can head to the project page on Edge Impulse Studio.

Alternatively, you can access the results programmatically: the configuration settings and output of the EON Tuner is stored in the variable state. You can access the results of the various trials with state.trials. Note that some trials can fail, so it's a good idea to test the status of each trial.

From there, you will want to sort the results based on some metric. In this example, we will sort based on int8 test set accuracy from highest to lowest.

Note: Edge Impulse supports only one learning block per project at this time (excluding anomaly detection blocks). As a result, we will use the first learning block (e.g. learning_blocks[0]) in the list to extract metrics.

import json

# The easiest way to view the results is to look at the EON Tuner page on your project
print(f"Navigate to https://studio.edgeimpulse.com/studio/{project_id}/tuner to see the results")

# Set quantization ("float32" or "int8")
qtzn = "int8"

# Filter out all failed trials
results = [r for r in state.trials if r.status == "completed"]

# Extract int8 accuracies from the trial results
accuracies = []
for result in results:
    accuracy = result.impulse.learn_blocks[0]["metrics"]["test"][qtzn]["accuracy"]
    accuracies.append(accuracy)

# Sort the results based on int8 accuracies
acc_results = zip(accuracies, results)
sorted_results = sorted(acc_results, reverse=True, key=lambda x: list(x)[0])
sorted_results = [result for _, result in sorted_results]

Now that we have the sorted results, we can extract the values we care about. We will print out the following metrics along with the impulse configuration (processing/learning block configuration and hyperparameters) of the top-performing trial.

This will help you determine if the impulse can fit on your target hardware and run fast enough for your needs. The impulse configuration can be used to recreate the processing and learning blocks on Edge Impulse. Later, we will set the project impulse based on the trial ID to simply deploy (rather than re-train).

Note: we assume the first learning block has the metrics we care about.

def get_metrics(results, qtzn, idx):
    """Calculate metrics for a given trial index"""

    metrics = {}

    # Get model accuracy results
    result_metrics = results[idx].impulse.learn_blocks[0]["metrics"]
    metrics["val-acc"] = result_metrics['validation'][qtzn]['accuracy']
    metrics["test-acc"] = result_metrics['test'][qtzn]['accuracy']
    
    # Calculate processing block RAM
    metrics["processing-block-ram"] = 0
    for i, dsp_block in enumerate(results[idx].impulse.dsp_blocks):
        metrics["processing-block-ram"] += dsp_block["performance"]["ram"]

    # Get latency, RAM, and ROM usage
    device_performance = results[idx].device_performance[qtzn]
    metrics["learning-block-latency-ms"] = device_performance['latency']
    metrics["learning-block-tflite-ram"] = device_performance['tflite']['ramRequired']
    metrics["learning-block-tflite-rom"] = device_performance['tflite']['romRequired']
    metrics["learning-block-eon-ram"] = device_performance['eon']['ramRequired']
    metrics["learning-block-eon-rom"] = device_performance['eon']['romRequired']

    return metrics

# The top performing impulse is the first element (sorted by highest int8 accuracy on test set)
trial_idx = 0

# Print info about the processing (DSP) blocks and store RAM usage
print("Processing blocks")
print("===")
for i, dsp_block in enumerate(sorted_results[trial_idx].impulse.dsp_blocks):
    print(f"Processing block {i}")
    print("---")
    print("Block:")
    print(json.dumps(dsp_block["block"], indent=2))
    print("Config:")
    print(json.dumps(dsp_block["config"], indent=2))
print()

# Print info about the learning blocks
print("Learning blocks")
print("===")
for i, learn_block in enumerate(sorted_results[trial_idx].impulse.learn_blocks):
    print(f"Learn block {i}")
    print("---")
    print("Block:")
    print(json.dumps(learn_block["block"], indent=2))
    print("Config:")
    print(json.dumps(learn_block["config"], indent=2))
    metadata = learn_block["metadata"]
    qtzn_metadata = [m for m in metadata["modelValidationMetrics"] if m.get("type") == qtzn]
print()

# Print metrics
metrics = get_metrics(sorted_results, qtzn, trial_idx)
print(f"Metrics ({qtzn}) for best trial")
print("===")
print(f"Validation accuracy: {metrics['val-acc']}")
print(f"Test accuracy: {metrics['test-acc']}")
print(f"Estimated processing blocks RAM (bytes): {metrics['processing-block-ram']}")
print(f"Estimated learning blocks latency (ms): {metrics['learning-block-latency-ms']}")
print(f"Estimated learning blocks RAM (bytes): {metrics['learning-block-tflite-ram']}")
print(f"Estimated learning blocks ROM (bytes): {metrics['learning-block-tflite-rom']}")
print(f"Estimated learning blocks RAM with EON Compiler (bytes): {metrics['learning-block-eon-ram']}")
print(f"Estimated learning blocks ROM with EON Compiler (bytes): {metrics['learning-block-eon-rom']}")

Graph results

You can optionally use a plotting package like matplotlib to graph the results from the top results to compare the metrics.

import matplotlib.pyplot as plt

# Get metrics for the top 3 trials (sorted by int8 test set accuracy)
num_trials = 3
top_metrics = [get_metrics(sorted_results, qtzn, idx) for idx in range(num_trials)]

# Construct metrics for plotting
test_accs = [top_metrics[x]['test-acc'] for x in range(num_trials)]
proc_rams = [top_metrics[x]['processing-block-ram'] for x in range(num_trials)]
learn_latencies = [top_metrics[x]['learning-block-latency-ms'] for x in range(num_trials)]
learn_tflite_rams = [top_metrics[x]['learning-block-tflite-ram'] for x in range(num_trials)]
learn_tflite_roms = [top_metrics[x]['learning-block-tflite-rom'] for x in range(num_trials)]
learn_eon_rams = [top_metrics[x]['learning-block-eon-ram'] for x in range(num_trials)]
learn_eon_roms = [top_metrics[x]['learning-block-eon-rom'] for x in range(num_trials)]

# Create plots
fig, axs = plt.subplots(7, 1, figsize=(8, 15))
indices = range(num_trials)

# Plot test accuracies
axs[0].barh(indices, test_accs)
axs[0].set_title("Test set accuracy")
axs[0].set_xlabel("Accuracy")
axs[0].set_ylabel("Trial")

# Plot processing block RAM
axs[1].barh(indices, proc_rams)
axs[1].set_title("Processing block RAM")
axs[1].set_xlabel("RAM (bytes)")
axs[1].set_ylabel("Trial")

# Plot learning block latency
axs[2].barh(indices, learn_latencies)
axs[2].set_title("Learning block latency")
axs[2].set_xlabel("Latency (ms)")
axs[2].set_ylabel("Trial")

# Plot learning block RAM (TFLite)
axs[3].barh(indices, learn_tflite_rams)
axs[3].set_title("Learning block RAM (TFLite)")
axs[3].set_xlabel("RAM (bytes)")
axs[3].set_ylabel("Trial")

# Plot learning block ROM (TFLite)
axs[4].barh(indices, learn_tflite_roms)
axs[4].set_title("Learning block ROM (TFLite)")
axs[4].set_xlabel("ROM (bytes)")
axs[4].set_ylabel("Trial")

# Plot learning block RAM (EON)
axs[5].barh(indices, learn_eon_rams)
axs[5].set_title("Learning block RAM (EON)")
axs[5].set_xlabel("RAM (bytes)")
axs[5].set_ylabel("Trial")

# Plot learning block ROM (EON)
axs[6].barh(indices, learn_eon_roms)
axs[6].set_title("Learning block ROM (EON)")
axs[6].set_xlabel("ROM (bytes)")
axs[6].set_ylabel("Trial")

# Prevent overlap
plt.tight_layout()

Results as a DataFrame

import pandas as pd

# Convert the state metrics into a DataFrame
df = tuner_report_as_df(state)
df.head()

# Print column names
for col in df.columns:
    print(col)

# Sort the DataFrame by validation (int8) accuracy
df = df.sort_values(by="test_int8_accuracy", ascending=False)

# Print out best trial metrics
print(f"Trial ID: {df.iloc[0]['id']}")
print(f"Test accuracy (int8): {df.iloc[0]['test_int8_accuracy']}")
print(f"Estimated learning blocks latency (ms): {df.iloc[0]['device_performance_int8_latency']}")
print(f"Estimated learning blocks RAM (bytes): {df.iloc[0]['device_performance_int8_tflite_ram_required']}")
print(f"Estimated learning blocks ROM (bytes): {df.iloc[0]['device_performance_int8_tflite_rom_required']}")
print(f"Estimated learning blocks RAM with EON Compiler (bytes): {df.iloc[0]['device_performance_int8_eon_ram_required']}")
print(f"Estimated learning blocks ROM with EON Compiler (bytes): {df.iloc[0]['device_performance_int8_eon_rom_required']}")

Set trial as impulse and deploy

We can replace the current impulse with the top performing trial from the EON Tuner. From there, we can deploy it, just like we would any impulse.

# Get the ID for the top-performing trial and set that to our impulse. This will take about a minute.
trial_id = df.iloc[0].trial_id
response = set_impulse_from_trial(trial_id)
job_id = response.id

# Make sure the impulse update was successful
if not hasattr(response, "success") or getattr(response, "success") == False:
    raise RuntimeError("Could not set project impulse to trial impulse")

# List the available profile target devices
ei.model.list_deployment_targets()

You should see a list printed such as:

['zip',
 'arduino',
 'cubemx',
 'wasm',
 ...
 'runner-linux-aarch64-jetson-orin-6-0']

Note that instead of writing the raw bytes to a file, you can also specify an output_directory argument in the .deploy() function. Your deployment file(s) will be downloaded to that directory.

# Build and download C++ library with the trained model
deploy_bytes = None
try:
    deploy_bytes = build(
        deploy_model_type=qtzn,
        engine="tflite",
        deploy_target="zip"
    )
except Exception as e:
    print(f"Could not deploy: {e}")
    
# Write the downloaded raw bytes to a file
if deploy_bytes:
    with open(deploy_filename, 'wb') as f:
        f.write(deploy_bytes.getvalue())

Configure custom search space

By default, the EON Tuner will make a guess at a search space based on the type of data you uploaded (e.g. using spectral-analysis blocks for feature extraction). As a result, you can run the tuner without needing to construct a search space. However, you may want to define your own search space.

The best way to define a search space is to open your project (after uploading data), head to the EON Tuner page, click Run EON Tuner, and select the Space tab.

The search space is defined in JSON format, so we can just copy that to create a dictionary. This is a good place to start for tuning blocks and hyperparameters.

Note: Functions to get available blocks and search space parameters coming soon

from edgeimpulse_api import (
    OptimizeConfig,
    TunerSpaceImpulse,
)

# Configure the search space
space = {
    "inputBlocks": [
      {
        "type": "time-series",
        "window": [
          {"windowSizeMs": 9000, "windowIncreaseMs": 9000},
          {"windowSizeMs": 10000, "windowIncreaseMs": 10000}
        ],
        "frequencyHz": [62.5],
        "padZeros": [True]
      }
    ],
    "dspBlocks": [
      {
        "type": "spectral-analysis",
        "analysis-type": ["FFT"],
        "fft-length": [16, 64],
        "scale-axes": [1],
        "filter-type": ["none"],
        "filter-cutoff": [3],
        "filter-order": [6],
        "do-log": [True],
        "do-fft-overlap": [True]
      },
      {
        "type": "spectral-analysis",
        "analysis-type": ["Wavelet"],
        "wavelet": ["haar", "bior1.3"],
        "wavelet-level": [1, 2]
      },
      {"type": "raw", "scale-axes": [1]}
    ],
    "learnBlocks": [
      {
        "id": 4,
        "type": "keras",
        "dimension": ["dense"],
        "denseBaseNeurons": [40, 20],
        "denseLayers": [2, 3],
        "dropout": [0.25, 0.5],
        "learningRate": [0.0005],
        "trainingCycles": [30]
      }
    ]
  }

# Wrap the search space
ts = TunerSpaceImpulse.from_dict(space)

# Create a custom configuration
config = OptimizeConfig(
    name=None,
    target_device={"name": "cortex-m4f-80mhz"},
    classification_type="classification",
    dataset_category="motion_continuous",
    target_latency=100,
    tuning_max_trials=2,
    space=[ts]
)

# Start tuner and wait for it to complete
start_custom_tuner(
    config=config
)
state = check_tuner(
    wait_for_completion=True
)

# The easiest way to view the results is to look at the EON Tuner page on your project
print(f"Navigate to https://studio.edgeimpulse.com/studio/{project_id}/tuner to see the results")

# Set quantization ("float32" or "int8")
qtzn = "int8"

# Filter out all failed trials
results = [r for r in state.trials if r.status == "completed"]

# Extract float32 accuracies from the trial results
accuracies = []
for result in results:
    accuracy = result.impulse.learn_blocks[0]["metrics"]["test"][qtzn]["accuracy"]
    accuracies.append(accuracy)

# Sort the results based on int8 accuracies
acc_results = zip(accuracies, results)
sorted_results = sorted(acc_results, reverse=True, key=lambda x: list(x)[0])
sorted_results = [result for _, result in sorted_results]

# The top performing impulse is the first element (sorted by highest int8 accuracy on test set)
trial_idx = 0

# Print info about the processing (DSP) blocks and store RAM usage
print("Processing blocks")
print("===")
for i, dsp_block in enumerate(sorted_results[trial_idx].impulse.dsp_blocks):
    print(f"Processing block {i}")
    print("---")
    print("Block:")
    print(json.dumps(dsp_block["block"], indent=2))
    print("Config:")
    print(json.dumps(dsp_block["config"], indent=2))
print()

# Print info about the learning blocks
print("Learning blocks")
print("===")
for i, learn_block in enumerate(sorted_results[trial_idx].impulse.learn_blocks):
    print(f"Learn block {i}")
    print("---")
    print("Block:")
    print(json.dumps(learn_block["block"], indent=2))
    print("Config:")
    print(json.dumps(learn_block["config"], indent=2))
    metadata = learn_block["metadata"]
    qtzn_metadata = [m for m in metadata["modelValidationMetrics"] if m.get("type") == qtzn]
print()

# Print metrics
metrics = get_metrics(sorted_results, qtzn, trial_idx)
print(f"Metrics ({qtzn}) for best trial")
print("===")
print(f"Validation accuracy: {metrics['val-acc']}")
print(f"Test accuracy: {metrics['test-acc']}")
print(f"Estimated processing blocks RAM (bytes): {metrics['processing-block-ram']}")
print(f"Estimated learning blocks latency (ms): {metrics['learning-block-latency-ms']}")
print(f"Estimated learning blocks RAM (bytes): {metrics['learning-block-tflite-ram']}")
print(f"Estimated learning blocks ROM (bytes): {metrics['learning-block-tflite-rom']}")
print(f"Estimated learning blocks RAM with EON Compiler (bytes): {metrics['learning-block-eon-ram']}")
print(f"Estimated learning blocks ROM with EON Compiler (bytes): {metrics['learning-block-eon-rom']}")

Using the Edge Impulse Python SDK with Hugging Face

🤗 offers a suite of tools that assist with various AI applications. Most notably, they provide a for people to share their pre-trained models. In this tutorial, we will demonstrate how to download a simple model from the Hugging Face hub, profile it, and convert it to a C++ library for use in your edge application. This particular model was trained to identify species of bean plants using the .

To learn more about using the Python SDK, please see:

# If you have not done so already, install the following dependencies
!python -m pip install huggingface_hub edgeimpulse

import json
from huggingface_hub import hf_hub_download
import edgeimpulse as ei

You will need to obtain an API key from an Edge Impulse project. Log into and create a new project. Open the project, navigate to Dashboard and click on the Keys tab to view your API keys. Double-click on the API key to highlight it, right-click, and select Copy.

Note that you do not actually need to use the project in the Edge Impulse Studio. We just need the API Key.

Paste that API key string in the ei.API_KEY value in the following cell:

# Edge Impulse Settings
ei.API_KEY = "ei_dae2..."
target_device = 'cortex-m4f-80mhz'
deploy_filename = "my_model_cpp.zip"

Set the name of the repo (username/repo-name) and the file we want to download.

# Define file location for our model
repo_name = "fxmarty/resnet-tiny-beans"
download_dir = "./"
model_filename = "model.onnx"

# Download pre-trained model
hf_hub_download(repo_id=repo_name,
                filename=model_filename,
                local_dir=download_dir)

Profile your model

To start, we need to list the possible target devices we can use for profiling. We need to pick from this list.

# List the available profile target devices
ei.model.list_profile_devices()

You should see a list printed such as:

['alif-he',
 'alif-hp',
 'arduino-nano-33-ble',
 'arduino-nicla-vision',
 'portenta-h7',
 'brainchip-akd1000',
 'cortex-m4f-80mhz',
 'cortex-m7-216mhz',
 ...
 'ti-tda4vm']

# Estimate the RAM, ROM, and inference time for our model on the target hardware family
try:
    profile = ei.model.profile(model=model_filename,
                               device='cortex-m4f-80mhz')
    print(profile.summary())
except Exception as e:
    print(f"Could not profile: {e}")

Deploy your model

Once you are happy with the performance of the model, you can deploy it to a number of possible hardware targets. To see the available hardware targets, run the following:

# List the available profile target devices
ei.model.list_deployment_targets()

You should see a list printed such as:

['zip',
 'arduino',
 'tinkergen',
 'cubemx',
 'wasm',
 ...
 'runner-linux-aarch64-tda4vm']

The most generic target is to download a .zip file containing a C++ library containing the inference runtime and your trained model, so we choose 'zip' from the above list. We also need to tell Edge Impulse how we are planning to use the model. In this case, we want to perform classification, so we set the output type to Classification.

Note that instead of writing the raw bytes to a file, you can also specify an output_directory argument in the .deploy() function. Your deployment file(s) will be downloaded to that directory.

# Create C++ library with trained model
deploy_bytes = None
try:
    deploy_bytes = ei.model.deploy(model=model_filename,
                                   model_output_type=ei.model.output_type.Classification(),
                                   deploy_target='zip')
except Exception as e:
    print(f"Could not deploy: {e}")
    
# Write the downloaded raw bytes to a file
if deploy_bytes:
    with open(deploy_filename, 'wb') as f:
        f.write(deploy_bytes.getvalue())

Using the Edge Impulse Python SDK with Weights & Biases

is an online framework for helping manage machine learning training, data versioning, and experiments. When running experiments for edge-focused ML projects, it can be helpful to see the required memory (RAM and ROM) along with estimated inference times of your model for your target hardware. By viewing these metrics, you can quickly gauge if your model will fit onto your target device!

Follow the code below to see how to train a simple machine learning model with different hyperparameters and log those values to the Weights & Biases dashboard.

To learn more about using the Python SDK, please see:

# If you have not done so already, install the following dependencies
!python -m pip install tensorflow==2.12.0 wandb edgeimpulse

from tensorflow import keras
import wandb
import edgeimpulse as ei

Note that you do not actually need to use the project in the Edge Impulse Studio. We just need the API Key.

Paste that API key string in the ei.API_KEY value in the following cell:

# Settings
ei.API_KEY = "ei_dae2..." # Change this to your Edge Impulse API key
labels = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
num_classes = len(labels)
num_epochs = 5
profile_device = 'cortex-m4f-80mhz' # Run ei.model.list_profile_devices() to see available devices
deploy_filename = "my_model_cpp.zip"

# Define experiment hyperparameters - sweep across number of nodes
project_name = "nodes-sweep"
num_nodes_sweep = [8, 16, 32, 64, 128]

# Log in to Weights and Biases (will open a prompt)
wandb.login()

Gather a dataset

# Load MNIST data
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = keras.utils.normalize(x_train, axis=1)
x_test = keras.utils.normalize(x_test, axis=1)
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
input_shape = x_train[0].shape

Create an experiment

We want to vary the hyperparameters in our model and see how it affects the accuracy and predicted RAM, ROM, and inference time on our target platform. To do that, we construct a function that builds a simple model using Keras, trains the model, and computes the accuracy and loss from our holdout test set. We then use the Edge Impulse Python SDK to generate a profile of our model for our target hardware. We log the hyperparameter (number of nodes in the hidden layer), test loss, test accuracy, estimated RAM, estimated ROM, and estimated inference time (ms) to our Weights and Biases console.

# Define experiment - Train and test model, log metrics
def do_experiment(num_nodes):

    # Create W&B project
    run = wandb.init(project=project_name,
                     name=f"{num_nodes}-nodes")

    # Build the model (vary number of nodes in the hidden layer)
    model = keras.Sequential([
        keras.layers.Flatten(),
        keras.layers.Dense(num_nodes, activation='relu', input_shape=input_shape),
        keras.layers.Dense(num_classes, activation='softmax')
    ])

    # Compile the model
    model.compile(optimizer='adam',
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])

    # Train the model
    model.fit(x_train, 
              y_train, 
              epochs=num_epochs)
  
    # Evaluate model
    test_loss, test_accuracy = model.evaluate(x_test, y_test)
    
    # Profile model on target device
    try:
        profile = ei.model.profile(model=model,
                                   device=profile_device)
    except Exception as e:
        print(f"Could not profile: {e}")

    # Log metrics
    if profile.success:
        print("Profiling successful. Logging.")
        wandb.log({
            'num_nodes': num_nodes,
            'test_loss': test_loss,
            'test_accuracy': test_accuracy,
            'profile_ram': profile.model.profile_info.float32.memory.tflite.ram,
            'profile_rom': profile.model.profile_info.float32.memory.tflite.rom,
            'inference_time_ms': profile.model.profile_info.float32.time_per_inference_ms
        })
    else:
        print(f"Profiling unsuccessful. Error: {job_resp.error}")

    # Close run
    wandb.finish()

Run the experiment

Now, it's time to run the experiment and log the results in Weights and Biases. Simply call our function and provide a new hyperparameter value for the number of nodes.

# Perform the experiments - check your dashboard in WandB!
for num_nodes in num_nodes_sweep:
    do_experiment(num_nodes)

Deploy Your Model

Once you are happy with the performance of your model, you can then deploy it to your target hardware. We will assume that 32 nodes in our hidden layer provided the best trade-off of RAM, flash, inference time, and accuracy for our needs. To start, we will retrain the model:

# Build the model 
model = keras.Sequential([
    keras.layers.Flatten(),
    keras.layers.Dense(32, activation='relu', input_shape=input_shape),
    keras.layers.Dense(num_classes, activation='softmax')
])


# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])


# Train the model
model.fit(x_train, 
          y_train, 
          epochs=5)

Next, we should evaluate the model on our holdout test set.

# Evaluate model on test set
score = model.evaluate(x_test, y_test, verbose=0)
print(f"Test loss: {score[0]}")
print(f"Test accuracy: {score[1]}")

From there, we can see the available hardware targets for deployment:

# List the available profile target devices
ei.model.list_deployment_targets()

You should see a list printed such as:

['zip',
 'arduino',
 'tinkergen',
 'cubemx',
 'wasm',
 ...
 'runner-linux-aarch64-tda4vm']

The most generic target is the .zip file that holds a C++ library containing our trained model and inference runtime. To pass our labels to the C++ library, we create a Classification object, which contains our label strings.

Note that instead of writing the raw bytes to a file, you can also specify an output_directory argument in the .deploy() function. Your deployment file(s) will be downloaded to that directory.

# Set model information, such as your list of labels
model_output_type = ei.model.output_type.Classification(labels=labels)

# Create C++ library with trained model
deploy_bytes = None
try:
    
    deploy_bytes = ei.model.deploy(model=model,
                                   model_output_type=model_output_type,
                                   deploy_target='zip')
except Exception as e:
    print(f"Could not deploy: {e}")
    
# Write the downloaded raw bytes to a file
if deploy_bytes:
    with open(deploy_filename, 'wb') as f:
        f.write(deploy_bytes.getvalue())

Using the Edge Impulse Python SDK with SageMaker Studio

Amazon SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all machine learning (ML) development steps, from preparing data to building, training, and deploying your ML models, improving data science team productivity by up to 10x. You can quickly upload data, create new notebooks, train and tune models, move back and forth between steps to adjust experiments, collaborate seamlessly within your organization, and deploy models to production without leaving SageMaker Studio.

To learn more about using the Python SDK, please see: Edge Impulse Python SDK Overview.

This guide has been built from AWS reference project Introduction to SageMaker TensorFlow - Image Classification, please have a look at this AWS documentation page.

Below are the changes made to the original training script and configuration:

The Python 3 (Data Science 3.0) kernel was used.
The dataset has been changed to classify images as car vs unknown. You can download the dataset from this Edge Impulse public project and store it in your S3 bucket.
The dataset has been imported in the Edge Impulse S3 bucket configured when creating the SageMaker Studio domain. Make sure to adapt to your path or use the AWS reference project.
The training instance used is ml.m5.large.

</div> Install dependencies

# If you have not done so already, install the following dependencies
!python -m pip install tensorflow==2.12.0 edgeimpulse

Transfer Learning

Dataset

Below is the structure of our dataset in our S3 bucket

car-vs-unknown
    |--training
        |--car
            |--abc.jpg
            |--def.jpg
        |--unknown
            |--ghi.jpg
            |--jkl.jpg
    |--testing
            |--car
                |--mno.jpg
                |--prs.jpg
            |--unknown
                |--tuv.jpg
                |--wxy.jpg

We have used the default bucket created when configuring SageMaker Studio domain:

import sagemaker, boto3, json
from sagemaker.session import Session

sagemaker_session = Session()
aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
print(aws_region)
sess = sagemaker.Session()

bucket = sess.default_bucket()
subfolder = 'car-vs-unknown/training/'

s3 = boto3.client('s3')
files = s3.list_objects(Bucket=bucket, Prefix=subfolder)['Contents']
print(f"Number of images: {len(files)}")
# or print the files
# for f in files:
#     print(f['Key'])

Train

You can continue with the default model, or can choose a different model from the list. Note that this tutorial has been tested with MobileNetv2 based models. A complete list of SageMaker pre-trained models can also be accessed at Sagemaker pre-trained Models.

from sagemaker.jumpstart.notebook_utils import list_jumpstart_models

# Retrieves all image classification models available by SageMaker Built-In Algorithms.
filter_value = "task in ['ic']"
ic_models = list_jumpstart_models(filter=filter_value)
# od_models = list_jumpstart_models()

print(f"Number of models available for inference: {len(ic_models)}")

# display the model-ids.
for model in ic_models:
    print(model)

from sagemaker import image_uris, model_uris

model_id, model_version = "tensorflow-ic-imagenet-mobilenet-v3-small-075-224", "*" # You can change the based model with one from the list generated above

# Retrieve the base model uri
base_model_uri = model_uris.retrieve( model_id=model_id, model_version=model_version, model_scope="inference")

print(base_model_uri)

Optional, ship this next cell if you don't want to retrain the model. And uncomment the last line of the cell after

from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.estimator import Estimator

training_instance_type = "ml.m5.large"

# Retrieve the Docker image
train_image_uri = image_uris.retrieve(model_id=model_id,model_version=model_version,image_scope="training",instance_type=training_instance_type,region=None,framework=None)

# Retrieve the training script
train_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope="training")

# Retrieve the pretrained model tarball for transfer learning
train_model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope="training")

# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)

# [Optional] Override default hyperparameters with custom values
hyperparameters["epochs"] = "5"

# The sample training data is available in the following S3 bucket
training_data_bucket = f"{bucket}"
training_data_prefix = f"{subfolder}"
# training_data_bucket = f"jumpstart-cache-prod-{aws_region}"
# training_data_prefix = "training-datasets/tf_flowers/"

training_dataset_s3_path = f"s3://{training_data_bucket}/{training_data_prefix}"

output_bucket = sess.default_bucket()
output_prefix = "ic-car-vs-unknown"
s3_output_location = f"s3://{output_bucket}/{output_prefix}/output"

# Create SageMaker Estimator instance
tf_ic_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=s3_output_location
)

# Use S3 path of the training data to launch SageMaker TrainingJob
tf_ic_estimator.fit({"training": training_dataset_s3_path}, logs=True)

Retrieve and prepare the newly trained model

def download_from_s3(url):
    # Remove 's3://' prefix from URL
    url = url[5:]
    # Split URL by '/' to extract bucket name and key
    parts = url.split('/')
    bucket_name = parts[0]
    s3_key = '/'.join(parts[1:])
    # Download the file from S3
    s3.download_file(bucket_name, s3_key, 'model.tar.gz')

# Download
trained_model_s3_path = f"{s3_output_location}/{tf_ic_estimator._current_job_name}/output/model.tar.gz"
print(trained_model_s3_path)
download_from_s3(trained_model_s3_path)
# or if you just want to use the based model
#download_from_s3(base_model_uri)

import shutil, os

# Extract the .tar.gz file to a temporary directory
temp_directory = 'tmp'  # Replace with your actual temporary directory
tar_gz_file = 'model.tar.gz'  # Replace with the path to your .tar.gz file

# Create directory if does not exist
if not os.path.exists(temp_directory):
    os.makedirs(temp_directory)

shutil.unpack_archive(tar_gz_file, temp_directory)

import tensorflow as tf

print(tf.__version__)

model = tf.keras.models.load_model('tmp/1/')

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

# Save the model.
with open('model.tflite', 'wb') as f:
  f.write(tflite_model)

Edge Impulse

Note that you do not actually need to use the project in the Edge Impulse Studio. We just need the API Key.

Paste that API key string in the ei.API_KEY value in the following cell:

import edgeimpulse as ei
ei.API_KEY = "ei_0a85c3a5ca92a35ee6f61aab18aadb9d9e167bd152f947f2056a4fb6a60977d8" # Change to your key

ei.model.list_profile_devices()

# Estimate the RAM, ROM, and inference time for our model on the target hardware family
try:
    profile = ei.model.profile(model='model.tflite',
                               device='raspberry-pi-4')
    print(profile.summary())
except Exception as e:
    print(f"Could not profile: {e}")

# List the available profile target devices
ei.model.list_deployment_targets()

# Get the labels from the label_info.json
import json

labels_info = open('tmp/labels_info.json')
labels_obj = json.load(labels_info)
labels = labels_obj['labels']
print(labels)

# Set model information, such as your list of labels
model_output_type = ei.model.output_type.Classification(labels=labels)
deploy_filename = "my_model_cpp.zip"

# Create C++ library with trained model
deploy_bytes = None
try:
    
    deploy_bytes = ei.model.deploy(model=model,
                                   model_output_type=model_output_type,
                                   engine='tflite',
                                   deploy_target='zip"')
except Exception as e:
    print(f"Could not deploy: {e}")
    
# Write the downloaded raw bytes to a file
if deploy_bytes:
    with open(deploy_filename, 'wb') as f:
        f.write(deploy_bytes.getvalue())

Voila! You now have a C++ library ready to be compiled and integrated in your embedded targets. Feel free to have a look at Edge Impulse deployment options on the documentation to understand how you can integrate that to your embedded systems.

You can also have a look at the deployment page of your project to test your model on a web browser or test

Using the Edge Impulse Python SDK to upload and download data

If you want to upload files directly to an Edge Impulse project, we recommend using the . However, sometimes you cannot upload your samples directly, as you might need to convert the files to one of the accepted formats or modify the data prior to model training. Edge Impulse offers for some types of projects, but you might want to create your own custom augmentation scheme. Or perhaps you want to and script the upload process.

The Python SDK offers a set of functions to help you move data into and out of your project. This can be extremely helpful when generating or augmenting your dataset. The following cells demonstrate some of these upload and download functions.

You can find the API documentation for the functions found in this tutorial .

WARNING: This notebook will add and delete data in your Edge Impulse project, so be careful! We recommend creating a throwaway project when testing this notebook.

Note that you might need to refresh the page with your Edge Impulse project to see the samples appear.

# If you have not done so already, install the following dependencies
!python -m pip install edgeimpulse

import edgeimpulse as ei

Note that you do not actually need to use the project in the Edge Impulse Studio. We just need the API Key.

Paste that API key string in the ei.API_KEY value in the following cell:

# Settings
ei.API_KEY = "ei_dae2..." # Change this to your Edge Impulse API key

Upload directory

The following file formats are allowed: .cbor, .json, .csv, .wav, .jpg, .png, .mp4, .avi.

from datetime import datetime

# Download image files to use as an example dataset
!mkdir -p dataset
!wget -P dataset -q \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/capacitor.01.png \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/capacitor.02.png

# Upload the entire directory
response = ei.experimental.data.upload_directory(
    directory="dataset",
    category="training",
    label=None, # Will use the prefix before the '.' on each filename for the label
    metadata={
        "date": datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
        "source": "camera",
    }
)

# Check to make sure there were no failures
assert len(response.fails) == 0, "Could not upload some files"

# Save the sample IDs, as we will need these to retrieve file information and delete samples
ids = []
for sample in response.successes:
    ids.append(sample.sample_id)

# Review the sample IDs and get the associated server-side filename
# Note the lack of extension! Multiple samples on the server can have the same filename.
for id in ids:
    filename = ei.experimental.data.get_filename_by_id(id)
    print(f"Sample ID: {id}, filename: {filename}")

If you head to the Data acquisition page on your project, you should see images in your dataset.

Download files

You can download samples from your Edge Impulse project if you know the sample IDs. You can get sample IDs by calling the ei.data.get_sample_ids() function, which allows you to filter IDs based on filename, category, and label.

# Get sample IDs for everything in the "training" category
infos = ei.experimental.data.get_sample_ids(category="training")

# The SampleInfo should match what we uploaded earlier
ids = []
for info in infos:
    print(info)
    ids.append(info.sample_id)

# Download samples
samples = ei.experimental.data.download_samples_by_ids(ids)

# Save the downloaded files
for sample in samples:
    with open(sample.filename, "wb") as file:
        file.write(sample.data.read())

# View sample information
for sample in samples:
    print(
        f"filename: {sample.filename}\r\n"
        f"  sample ID: {sample.sample_id}\r\n"
        f"  category: {sample.category}\r\n"
        f"  label: {sample.label}\r\n"
        f"  bounding boxes: {sample.bounding_boxes}\r\n"
        f"  metadata: {sample.metadata}"
    )

Take a look at the files in this directory. You should see the downloaded images. They should match the images in the dataset/ directory, which were the original images that we uploaded.

Delete files

If you know the ID of the sample you would like to delete, you can call the delete_sample_by_id() function. You can also delete all the samples in your project by calling delete_all_samples().

# Delete the samples from the Edge Impulse project that we uploaded earlier
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

Take a look at the data in your project. The samples that we uploaded should be gone.

Upload folder for object detection

Important! The annotations file must be named exactly info.labels

# Download images and bounding box annotations to use as an example dataset
!mkdir -p dataset
!rm dataset/capacitor.01.png dataset/capacitor.02.png
!wget -P dataset -q \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/dog-ball-toy.01.png \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/dog-ball-toy.02.png \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/annotations/info.labels

# Upload the entire directory (including the info.labels file)
response = ei.experimental.data.upload_exported_dataset(
    directory="dataset",
)

# Check to make sure there were no failures
assert len(response.fails) == 0, "Could not upload some files"

# Save the sample IDs, as we will need these to retrieve file information and delete samples
ids = []
for sample in response.successes:
    ids.append(sample.sample_id)

If you head to the Data acquisition page on your project, you should see images in your dataset along with the bounding box information.

# Delete the samples from the Edge Impulse project that we uploaded
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

Upload individual CSV files

import csv
import io
import os

# Create example CSV data
sample_data = [
    [
        ["timestamp", "accX", "accY", "accZ"],
        [0, -9.81, 0.03, 0.21],
        [10, -9.83, 0.04, 0.27],
        [20, -9.12, 0.03, 0.23],
        [30, -9.14, 0.01, 0.25],
    ],
    [
        ["timestamp", "accX", "accY", "accZ"],
        [0, -9.56, 5.34, 1.21],
        [10, -9.43, 1.37, 1.27],
        [20, -9.22, -4.03, 1.23],
        [30, -9.50, -0.98, 1.25],
    ],
]

# Write to CSV files
filenames = [
    "001.csv",
    "002.csv"
]
for i, filename in enumerate(filenames):
    file_path = os.path.join("dataset", filename)
    with open(file_path, "w", newline="") as file:
        writer = csv.writer(file)
        writer.writerows(sample_data[i])

# Add metadata to the CSV data
my_samples = [
    {
        "filename": filenames[0],
        "data": open(os.path.join("dataset", filenames[0]), "rb"),
        "category": "training",
        "label": "idle",
        "metadata": {
            "source": "accelerometer",
            "collection site": "desk",
        },
    },
    {
        "filename": filenames[1],
        "data": open(os.path.join("dataset", filenames[1]), "rb"),
        "category": "training",
        "label": "wave",
        "metadata": {
            "source": "accelerometer",
            "collection site": "desk",
        },
    },
]

# Wrap the samples in instances of the Sample class
samples = [ei.experimental.data.Sample(**i) for i in my_samples]

# Upload samples to your project
response = ei.experimental.data.upload_samples(samples)

# Check to make sure there were no failures
assert len(response.fails) == 0, "Could not upload some files"

# Save the sample IDs, as we will need these to retrieve file information and delete samples
ids = []
for sample in response.successes:
    ids.append(sample.sample_id)

If you head to the Data acquisition page on your project, you should see your time series data.

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

Upload JSON data directly

The raw data must be encoded in an IO object. We convert the dictionary objects to a BytesIO object, but you can also read in data from .json files.

import io
import json

# Create two different example data samples
sample_data_1 = {
    "protected": {
        "ver": "v1",
        "alg": "none",
    },
    "signature": 0,
    "payload": {
        "device_name": "ac:87:a3:0a:2d:1b",
        "device_type": "DISCO-L475VG-IOT01A",
        "interval_ms": 10,
        "sensors": [
            { "name": "accX", "units": "m/s2" },
            { "name": "accY", "units": "m/s2" },
            { "name": "accZ", "units": "m/s2" }
        ],
        "values": [
            [ -9.81, 0.03, 0.21 ],
            [ -9.83, 0.04, 0.27 ],
            [ -9.12, 0.03, 0.23 ],
            [ -9.14, 0.01, 0.25 ]
        ]
    }
}
sample_data_2 = {
    "protected": {
        "ver": "v1",
        "alg": "none",
    },
    "signature": 0,
    "payload": {
        "device_name": "ac:87:a3:0a:2d:1b",
        "device_type": "DISCO-L475VG-IOT01A",
        "interval_ms": 10,
        "sensors": [
            { "name": "accX", "units": "m/s2" },
            { "name": "accY", "units": "m/s2" },
            { "name": "accZ", "units": "m/s2" }
        ],
        "values": [
            [ -9.56, 5.34, 1.21 ],
            [ -9.43, 1.37, 1.27 ],
            [ -9.22, -4.03, 1.23 ],
            [ -9.50, -0.98, 1.25 ]
        ]
    }
}

# Provide a filename, category, label, and optional metadata for each sample
my_samples = [
    {
        "filename": "001.json",
        "data": io.BytesIO(json.dumps(sample_data_1).encode('utf-8')),
        "category": "training",
        "label": "idle",
        "metadata": {
            "source": "accelerometer",
            "collection site": "desk",
        },
    },
    {
        "filename": "002.json",
        "data": io.BytesIO(json.dumps(sample_data_2).encode('utf-8')),
        "category": "training",
        "label": "wave",
        "metadata": {
            "source": "accelerometer",
            "collection site": "desk",
        },
    },
]

# Wrap the samples in instances of the Sample class
samples = [ei.data.sample_type.Sample(**i) for i in my_samples]

# Upload samples to your project
response = ei.experimental.data.upload_samples(samples)

# Check to make sure there were no failures
assert len(response.fails) == 0, "Could not upload some files"

# Save the sample IDs, as we will need these to retrieve file information and delete samples
ids = []
for sample in response.successes:
    ids.append(sample.sample_id)

If you head to the Data acquisition page on your project, you should see your time series data.

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

Upload NumPy arrays

Important! NumPy arrays must be in the shape (Number of samples, number of data points, number of sensors)

If you are working with image data in NumPy, we recommend saving those images as .png or .jpg files and using upload_directory().

import numpy as np

# Create example NumPy array with 2 time series samples
sample_data = np.array(
    [
        [ # Sample 1 ("idle")
            [-9.81, 0.03, 0.21],
            [-9.83, 0.04, 0.27],
            [-9.12, 0.03, 0.23],
            [-9.14, 0.01, 0.25],
        ],
        [ # Sample 2 ("wave")
            [-9.56, 5.34, 1.21],
            [-9.43, 1.37, 1.27],
            [-9.22, -4.03, 1.23],
            [-9.50, -0.98, 1.25],
        ],
    ]
)

# Labels for each sample
labels = ["idle", "wave"]

# Names of the sensors and units for the 3 axes
sensors = [
    { "name": "accX", "units": "m/s2" },
    { "name": "accY", "units": "m/s2" },
    { "name": "accZ", "units": "m/s2" },
]

# Optional metadata for all samples being uploaded
metadata = {
    "source": "accelerometer",
    "collection site": "desk",
}

# Upload samples to your project
response = ei.experimental.data.upload_numpy(
    data=sample_data,
    labels=labels,
    sensors=sensors,
    sample_rate_ms=10,
    metadata=metadata,
    category="training",
)

# Check to make sure there were no failures
assert len(response.fails) == 0, "Could not upload some files"

# Save the sample IDs, as we will need these to retrieve file information and delete samples
ids = []
for sample in response.successes:
    ids.append(sample.sample_id)

If you head to the Data acquisition page on your project, you should see your time series data. Note that the sample names are randomly assigned, so we recommend recording the sample IDs when you upload.

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

Upload pandas (and pandas-like) dataframes

Note that several other packages exist that work as drop-in replacements for pandas. You can use these replacements so long as you import that with the name pd. For example, one of:

import pandas as pd
import modin.pandas as pd
import dask.dataframe as pd
import polars as pd

import pandas as pd

The first option is to upload one dataframe for each sample (non-time series)

# Construct one dataframe for each sample (multidimensional, non-time series)
df_1 = pd.DataFrame([[-9.81, 0.03, 0.21]], columns=["accX", "accY", "accZ"])
df_2 = pd.DataFrame([[-9.56, 5.34, 1.21]], columns=["accX", "accY", "accZ"])

# Optional metadata for all samples being uploaded
metadata = {
    "source": "accelerometer",
    "collection site": "desk",
}

# Upload the first sample
ids = []
response = ei.experimental.data.upload_pandas_sample(
    df_1,
    label="One",
    filename="001",
    metadata=metadata,
    category="training",
)
assert len(response.fails) == 0, "Could not upload some files"
for sample in response.successes:
    ids.append(sample.sample_id)

# Upload the second sample
response = ei.experimental.data.upload_pandas_sample(
    df_2,
    label="Two",
    filename="002",
    metadata=metadata,
    category="training",
)
assert len(response.fails) == 0, "Could not upload some files"
for sample in response.successes:
    ids.append(sample.sample_id)

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

You can also upload one dataframe for each sample (time series). As with previous examples, we'll assume that the sample rate is 10 ms.

# Create samples (multidimensional, time series)
sample_data_1 = [ # Sample 1 ("idle")
    [-9.81, 0.03, 0.21],
    [-9.83, 0.04, 0.27],
    [-9.12, 0.03, 0.23],
    [-9.14, 0.01, 0.25],
]
sample_data_2 = [ # Sample 1 ("wave")
    [-9.56, 5.34, 1.21],
    [-9.43, 1.37, 1.27],
    [-9.22, -4.03, 1.23],
    [-9.50, -0.98, 1.25],
]

# Construct one dataframe for each sample
df_1 = pd.DataFrame(sample_data_1, columns=["accX", "accY", "accZ"])
df_2 = pd.DataFrame(sample_data_2, columns=["accX", "accY", "accZ"])

# Optional metadata for all samples being uploaded
metadata = {
    "source": "accelerometer",
    "collection site": "desk",
}

# Upload the first sample
ids = []
response = ei.experimental.data.upload_pandas_sample(
    df_1,
    label="Idle",
    filename="001",
    sample_rate_ms=10,
    metadata=metadata,
    category="training",
)
assert len(response.fails) == 0, "Could not upload some files"
for sample in response.successes:
    ids.append(sample.sample_id)

# Upload the second sample
response = ei.experimental.data.upload_pandas_sample(
    df_2,
    label="Wave",
    filename="002",
    sample_rate_ms=10,
    metadata=metadata,
    category="training",
)
assert len(response.fails) == 0, "Could not upload some files"
for sample in response.successes:
    ids.append(sample.sample_id)

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

You can upload non-time series data where each sample is a row in the dataframe. Note that you need to provide labels in the rows.

# Construct non-time series data, where each row is a different sample
data = [
    ["desk", "training", "One", -9.81, 0.03, 0.21],
    ["field", "training", "Two", -9.56, 5.34, 1.21],
]
columns = ["loc", "category", "label", "accX", "accY", "accZ"]

# Wrap the data in a DataFrame
df = pd.DataFrame(data, columns=columns)

# Upload non-time series DataFrame (with multiple samples) to the project
ids = []
response = ei.experimental.data.upload_pandas_dataframe(
    df,
    feature_cols=["accX", "accY", "accZ"],
    label_col="label",
    category_col="category",
    metadata_cols=["loc"],
)
assert len(response.fails) == 0, "Could not upload some files"
for sample in response.successes:
    ids.append(sample.sample_id)

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

A "wide" dataframe is one where each column represents a value in the time series data, and the rows become individual samples. Note that you need to provide labels in the rows.

# Construct time series data, where each row is a different sample
data = [
    ["desk", "training", "idle", 0.8, 0.7, 0.8, 0.9, 0.8, 0.8, 0.7, 0.8],
    ["field", "training", "motion", 0.3, 0.9, 0.4, 0.6, 0.8, 0.9, 0.5, 0.4],
]
columns = ["loc", "category", "label", "0", "1", "2", "3", "4", "5", "6", "7"]

# Wrap the data in a DataFrame
df = pd.DataFrame(data, columns=columns)

# Upload time series DataFrame (with multiple samples) to the project
ids = []
response = ei.experimental.data.upload_pandas_dataframe_wide(
    df,
    label_col="label",
    category_col="category",
    metadata_cols=["loc"],
    data_col_start=3,
    sample_rate_ms=100,
)
assert len(response.fails) == 0, "Could not upload some files"
for sample in response.successes:
    ids.append(sample.sample_id)

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

A DataFrame can also be divided into "groups" so you can upload multidimensional time series data.

# Create samples
sample_data = [
    ["desk", "sample 1", "training", "idle", 0, -9.81, 0.03, 0.21],
    ["desk", "sample 1", "training", "idle", 0.01, -9.83, 0.04, 0.27],
    ["desk", "sample 1", "training", "idle", 0.02, -9.12, 0.03, 0.23],
    ["desk", "sample 1", "training", "idle", 0.03, -9.14, 0.01, 0.25],
    ["field", "sample 2", "training", "wave", 0, -9.56, 5.34, 1.21],
    ["field", "sample 2", "training", "wave", 0.01, -9.43, 1.37, 1.27],
    ["field", "sample 2", "training", "wave", 0.02, -9.22, -4.03, 1.23],
    ["field", "sample 2", "training", "wave", 0.03, -9.50, -0.98, 1.25],
]
columns = ["loc", "sample_name", "category", "label", "timestamp", "accX", "accY", "accZ"]

# Wrap the data in a DataFrame
df = pd.DataFrame(sample_data, columns=columns)

# Upload time series DataFrame (with multiple samples and multiple dimensions) to the project
ids = []
response = ei.experimental.data.upload_pandas_dataframe_with_group(
    df,
    group_by="sample_name",
    timestamp_col="timestamp",
    feature_cols=["accX", "accY", "accZ"],
    label_col="label",
    category_col="category",
    metadata_cols=["loc"]
)
assert len(response.fails) == 0, "Could not upload some files"
for sample in response.successes:
    ids.append(sample.sample_id)

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

Label image data using GPT-4o

In this tutorial, we will explore how to label image data using GPT-4o, a powerful language model developed by OpenAI. GPT-4o is capable of generating accurate and meaningful labels for images, making it a valuable tool for image classification tasks. By leveraging the capabilities of GPT-4o, we can automate the process of labeling image data, saving time and effort in data preprocessing.

We packaged in a "pre-built Transformation block" (available for all Enterprise plans), an innovative method to distill LLM knowledge.

This pre-built transformation block can be found under the Data sources tab in the Data acquisition view.

The block takes all your unlabeled image files and asks GPT-4o to label them based on your prompt - and we automatically add the reasoning as metadata to your items!

Your prompt should return a single label, e.g.

Is there a person in this picture? Answer with just 'yes' or 'no'.

How to use it

The GPT-4o model processes images and assigns labels based on the content, filtering out any images that do not meet the quality criteria.

Step 1: Data Collection

Navigate to the Data acquisition page and add images to your project's dataset. In the video tutorial above, we show how to collect a video recorded directly from a phone, upload it to Edge Impulse and split the video into individual frames.

Step 2: Add the labeling block

In the Data sources tab, add the "Label image data using GPT-4o" block:

Step 4: Configure the labeling block

OpenAI API key: Add your OpenAI API key. This value will be stored as a secret, and won't be shown again.
Prompt: Your prompt should return a single label. For example:

Is there a person in this picture? Respond only with "yes", "no" or "unsure" if you're not sure.

Disable samples w/ label: If a certain label is output, disable the data item - these are excluded from training. Multiple labels are accepted, separate them with a coma.
Max. no. of samples to label: Number of samples to label.
Concurrency: Number of samples to label in parallel.
Auto-convert videos: If set, all videos are automatically split into individual images before labeling.

Optional: Editing your labeling block

To edit your configuration, you need to update the json-like steps of your block:

Step 5: Execute

Then, run the block to automatically label the frames.

And here is an example of the returned logs:

Step 6: Train your model

Use the labeled data to train a machine learning model. See the end-to-end tutorial Adding sight to your sensors.

Step 7: Deployment

In the video tutorial, we deployed the trained model to an MCU-based edge device - the Arduino Nicla Vision.

Results

The small model we tested this on performed exceptionally well, identifying toys in various scenes quickly and accurately. By distilling knowledge from the large LLM, we created a specialized, efficient model suitable for edge deployment.

Conclusion

The latest multimodal LLMs are incredibly powerful but too large for many practical applications. At Edge Impulse, we enable the transfer of knowledge from these large models to smaller, specialized models that run efficiently on edge devices.

Our "Label image data using GPT-4o" block is available for enterprise customers, allowing you to experiment with this technology.

For further assistance, visit our forum.

Examples & Resources

Blog post: Label image data using GPT-4o blog post

Label audio data using your existing models

This example comes from the Edge Impulse Linux Inferencing Python SDK that has been slightly modify to upload the raw data back to Edge Impulse based on the inference results.

To run the example:

Clone this repository:

git clone git@github.com:edgeimpulse/example-active-learning-linux-python-sdk.git

Install the dependencies:

pip3 install -r requirements.txt

Grab your the API key of the project you want to upload the inferred results raw data:

Past the new key in the EI_API_KEY variable in the audio-classify-export.py file. Alternatively, load it from your environment variable:

export EI_API_KEY='ei_xxxx'

Download your modelfile.eim:

edge-impulse-linux-runner --download modelfile.eim

Run the script:

python3 audio-classify-export.py modelfile.eim yes,no 0.6 0.8

Here are the arguments you can set:

modelfile.eim, path the model.eim
yes,no, labels to upload, separated by comas, no space
0.6, low confidence threshold
0.8, high confidence threshold
<audio_device_ID, optional>

In a keyword spotting model, it can give the following results:

python3 audio-classify-export.py modelfile.eim yes,no 0.6 0.8
['modelfile.eim', 'yes,no', '0.6', '0.8']
{'model_parameters': {'axis_count': 1, 'frequency': 16000, 'has_anomaly': 0, 'image_channel_count': 0, 'image_input_frames': 0, 'image_input_height': 0, 'image_input_width': 0, 'input_features_count': 16000, 'interval_ms': 0.0625, 'label_count': 4, 'labels': ['no', 'noise', 'unknown', 'yes'], 'model_type': 'classification', 'sensor': 1, 'slice_size': 4000, 'use_continuous_mode': True}, 'project': {'deploy_version': 29, 'id': 10487, 'name': 'Keywords Detection', 'owner': 'Demo Team'}}
Loaded runner for "Demo Team / Keywords Detection"
0 --> MacBook Pro Microphone
2 --> Microsoft Teams Audio
3 --> Descript Loopback Recorder
4 --> ZoomAudioDevice
Type the id of the audio device you want to use: 
0
selected Audio device: 0

Result (0 ms.) no: 0.18 noise: 0.16     unknown: 0.20   yes: 0.46
Result (0 ms.) no: 0.13 noise: 0.58     unknown: 0.22   yes: 0.07
Result (0 ms.) no: 0.00 noise: 0.89     unknown: 0.10   yes: 0.01
Result (0 ms.) no: 0.00 noise: 0.01     unknown: 0.04   yes: 0.95
Result (0 ms.) no: 0.00 noise: 0.82     unknown: 0.10   yes: 0.07
Result (0 ms.) no: 0.02 noise: 0.77     unknown: 0.13   yes: 0.08
Result (0 ms.) no: 0.01 noise: 0.14     unknown: 0.26   yes: 0.59
Result (0 ms.) no: 0.07 noise: 0.76     unknown: 0.15   yes: 0.01
Result (0 ms.) no: 0.04 noise: 0.24     unknown: 0.11   yes: 0.61       Uploading sample to Edge Impulse...
Successfully uploaded audio to Edge Impulse.

Result (0 ms.) no: 0.02 noise: 0.93     unknown: 0.04   yes: 0.00
Result (0 ms.) no: 0.01 noise: 0.67     unknown: 0.32   yes: 0.01
Result (0 ms.) no: 0.02 noise: 0.18     unknown: 0.23   yes: 0.57
Result (0 ms.) no: 0.07 noise: 0.70     unknown: 0.22   yes: 0.01
Result (0 ms.) no: 0.03 noise: 0.83     unknown: 0.12   yes: 0.02
Result (0 ms.) no: 0.24 noise: 0.44     unknown: 0.21   yes: 0.11
Result (0 ms.) no: 0.23 noise: 0.25     unknown: 0.42   yes: 0.10
Result (0 ms.) no: 0.04 noise: 0.76     unknown: 0.18   yes: 0.02
Result (0 ms.) no: 0.16 noise: 0.67     unknown: 0.12   yes: 0.05
Result (0 ms.) no: 0.12 noise: 0.81     unknown: 0.06   yes: 0.01
Result (0 ms.) no: 0.54 noise: 0.24     unknown: 0.12   yes: 0.10
Result (0 ms.) no: 0.01 noise: 0.91     unknown: 0.05   yes: 0.03
Result (0 ms.) no: 0.65 Uploading sample to Edge Impulse...
Successfully uploaded audio to Edge Impulse.
noise: 0.08     unknown: 0.17   yes: 0.10
Result (0 ms.) no: 0.00 noise: 0.96     unknown: 0.03   yes: 0.00
Result (0 ms.) no: 0.04 noise: 0.80     unknown: 0.13   yes: 0.03
Result (0 ms.) no: 0.03 noise: 0.27     unknown: 0.16   yes: 0.54
Result (0 ms.) no: 0.05 noise: 0.66     unknown: 0.15   yes: 0.14
Result (0 ms.) no: 0.08 noise: 0.74     unknown: 0.14   yes: 0.04
Result (0 ms.) no: 0.01 noise: 0.87     unknown: 0.11   yes: 0.02
Result (0 ms.) no: 0.01 noise: 0.87     unknown: 0.06   yes: 0.06
...

Generate synthetic datasets

Synthetic datasets are a collection of data artificially generated rather than being collected from real-world observations or measurements. They are created using algorithms, simulations, or mathematical models to mimic the characteristics and patterns of real data. Synthetic datasets are a valuable tool to generate data for experimentation, testing, and development when obtaining real data is challenging, costly, or undesirable.

You might want to generate synthetic datasets for several reasons:

Cost Efficiency: Creating synthetic data can be more cost-effective and efficient than collecting large volumes of real data, especially in resource-constrained environments.

Data Augmentation: Synthetic datasets allow users to augment their real-world data with variations, which can improve model robustness and performance.

Data Diversity: Synthetic datasets enable the inclusion of uncommon or rare scenarios, enriching model training with a wider range of potential inputs.

Privacy and Security: When dealing with sensitive data, synthetic datasets provide a way to train models without exposing real information, enhancing privacy and security.

You can generate synthetic data directly from Edge Impulse using the Synthetic Data tab in the Data acquisition view. This tab provides a user-friendly interface to generate synthetic data for your projects. You can create synthetic datasets using a variety of tools and models.

We have put together the following tutorials to help you get started with synthetic datasets generation:

Using integrated models directly available inside Edge Impulse Studio

DALL-E Image Generation Block: Generate image datasets using Dall·E using the DALL-E model.
Whisper Keyword Spotting Generation Block: Generate keyword-spotting datasets using the Whisper model. Ideal for keyword spotting and speech recognition applications.
Eleven Labs Sound Generation Block: Generate sound datasets using the Eleven Labs model. Ideal for generating realistic sound effects for various applications.

Note that you will need an API Key/Access Token from the different providers to run the model used to generate the synthetic data.

If you want to create your own synthetic data block, see Add custom models to the Synthetic Data Tab.

Generate image datasets using Dall·E

This notebook explores how we can use generative AI to create datasets which don't exist yet. This can be a good starting point for your project if you have not collected or cannot collect the data required. It is important to note the limitations of generative AI still apply here, biases can be introduced through your prompts, results can include "hallucinations" and quality control is important.

This example uses the OpenAI API to call the Dall-E image generation tool, it explores both generation and variation but there are other tools such as editing which could also be useful for augmenting an existing dataset.

There is also a video version of this tutorial:

We have wrapped this example into a Transformation Block (Enterprise Feature) to make it even easier to generate images and upload them to your organization. See: https://github.com/edgeimpulse/example-transform-Dall-E-images

Local Software Requirements

Python 3
Pip package manager
Jupyter Notebook: https://jupyter.org/install
pip packages (install with pip installpackagename):
- openai https://pypi.org/project/openai/

! pip install openai

# Imports
import openai
import os
import requests

# Notebook Imports
from IPython.display import Image
from IPython.display import display
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

Set up OpenAI API

First off you will need to set up and Edge Impulse account and create your first project.

You will also need to create an API Key for OpenAI: https://platform.openai.com/docs/api-reference/authentication


# You can set your API key and org as environment variables in your system like this:
# os.environ['OPENAI_API_KEY'] = 'api string'

# Set up OpenAI API key and organization
openai.api_key = os.getenv("OPENAI_API_KEY")

Generate your first image

The API takes in a prompt, number of images and a size

image_prompt = "A webcam image of a human 1m from the camera sitting at a desk showing that they are wearing gloves with their hands up to the camera."
# image_prompt = "A webcam image of a person 1m from the camera sitting at a desk with their bare hands up to the camera."
# image_prompt = "A webcam image of a human 1m from the camera sitting at a desk showing that they are wearing wool gloves with their hands up to the camera."

response = openai.Image.create(
    prompt=image_prompt,
    n=1,
    size="256x256",
)
Image(url=response["data"][0]["url"])

Generate some variations of this image

The API also has a variations call which takes in an existing images and creates variations of it. This could also be used to modify existing images.

response2 = openai.Image.create_variation(
  image=requests.get(response['data'][0]['url']).content,
  n=3,
  size="256x256"
)
imgs = []
for img in response2['data']:
  imgs.append(Image(url=img['url']))

display(*imgs)

Generate a dataset:

Here we are iterate through a number of images and variations to generate a dataset based on the prompts/labels given.

labels = [{"prompt": "A webcam image of a human 1m from the camera sitting at a desk showing that they are wearing wool gloves with their hands up to the camera.",
          "label": "gloves"},
          {"prompt": "A webcam image of a person 1m from the camera sitting at a desk with their bare hands up to the camera.",
          "label": "no-gloves"}
        ]
output_folder = "output"
base_images_number = 10
variation_per_image = 3
# Check if output directory for noisey files exists and create it if it doesn't
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

for option in labels:
    for i in range(base_images_number):
        response = openai.Image.create(
            prompt=option["prompt"],
            n=1,
            size="256x256",
        )
        try:
            img = response["data"][0]["url"]
            with open(f'{output_folder}/{option["label"]}.{img.split("/")[-1]}.png', 'wb+') as f:
                f.write(requests.get(img).content)
            response2 = openai.Image.create_variation(
                image=requests.get(img).content,
                n=variation_per_image,
                size="256x256"
            )
        except Exception as e:
            print(e)
        for img in response2['data']:
            try:
                with open(f'{output_folder}/{option["label"]}.{img["url"].split("/")[-1]}.png', 'wb') as f:
                    f.write(requests.get(img["url"]).content)
            except Exception as e:
                print(e)

Plot all the output images:

import os


# Define the folder containing the images
folder_path = './output'

# Get a list of all the image files in the folder
image_files = [f for f in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, f)) and f.endswith('.png')]

# Set up the plot
fig, axs = plt.subplots(nrows=20, ncols=20, figsize=(10, 10))

# Loop through each image and plot it in a grid cell
for i in range(20):
    for j in range(20):
        img = mpimg.imread(os.path.join(folder_path, image_files[i*10+j]))
        axs[i,j].imshow(img)
        axs[i,j].axis('off')

# Make the plot look clean
fig.subplots_adjust(hspace=0, wspace=0)
plt.tight_layout()
plt.show()

These files can then be uploaded to a project with these commands (run in a separate terminal window):

! cd output
! edge-impulse-uploader .

(run edge-impulse-uploader --clean if you have used the CLI before to reset the target project)

What next?

Now you can use your images to create an image classification model on Edge Impulse.

Why not try some other OpenAI calls, 'edit' could be used to take an existing image and translate it into different environments or add different humans to increase the variety of your dataset. https://platform.openai.com/docs/guides/images/usage

Generate keyword spotting datasets

Local Software Requirements

Python 3
Pip package manager
Jupyter Notebook: https://jupyter.org/install
pip packages (install with pip installpackagename):
- pydub https://pypi.org/project/pydub/
- google-cloud-texttospeech https://cloud.google.com/python/docs/reference/texttospeech/latest
- requests https://pypi.org/project/requests/

# Imports
import os
import json
import time
import io
import random
import requests
from pydub import AudioSegment
from google.cloud import texttospeech

Set up Google TTS API

First off you will need to set up and Edge Impulse account and create your first project. You will also need a Google Cloud account with the Text to Speech API enabled: https://cloud.google.com/text-to-speech, the first million characters generated each month are free (WaveNet voices), this should be plenty for most cases as you'll only need to generate your dataset once. From google you will need to download a credentials JSON file and set it to the correct environment variable on your system to allow the python API to work: (https://developers.google.com/workspace/guides/create-credentials#service-account)


# Insert the path to your service account API key json file here for google cloud
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '../path-to-google-credentials-file.json'

Generate the desired samples

First off we need to set our desired keywords and labels:


# Keyword or short sentence and label (e.g. 'hello world')
keyword = [
    {'string':'edge','label':'edge'},
    {'string':'impulse','label':'impulse'},
           ]

Then we need to set up the parameters for our speech dataset, all possible combinations will be iterated through:

languages - Choose the text to speech voice languages to use (https://cloud.google.com/text-to-speech/docs/voices)
pitches - Which voice pitches to apply
genders - Which SSML genders to apply
speakingRates - Which speaking speeds to apply



# Languages, remove as appropriate
# languages = [
#     'ar-XA', 'bn-IN',  'en-GB',  'fr-CA',
#     'en-US', 'es-ES',  'fi-FI',  'gu-IN',
#     'ja-JP', 'kn-IN',  'ml-IN',  'sv-SE',
#     'ta-IN', 'tr-TR',  'cs-CZ',  'de-DE',
#     'en-AU', 'en-IN',  'fr-FR',  'hi-IN',
#     'id-ID', 'it-IT',  'ko-KR',  'ru-RU',
#     'uk-UA', 'cmn-CN', 'cmn-TW', 'da-DK',
#     'el-GR', 'fil-PH', 'hu-HU',  'nb-NO',
#     'nl-NL', 'pt-PT',  'sk-SK',  'vi-VN',
#     'pl-PL', 'pt-BR',  'ca-ES',  'yue-HK',
#     'af-ZA', 'bg-BG',  'lv-LV',  'ro-RO',
#     'sr-RS', 'th-TH',  'te-IN',  'is-IS'
# ]
languages = [
    'en-GB',
    'en-US',
]
# Pitches to generate (in semitones) range: [-20.0, 20.0]
pitches = [-2, 0, 2]
# Voice genders to use
genders = ["NEUTRAL", "FEMALE", "MALE"]
# Speaking rates to use range: [0.25, 4.0]
speakingRates = [0.9, 1, 1.1]

Then provide some other key parameters:

out_length - How long each output sample should be
count - Maximum number of samples to output (if all combinations of languages, pitches etc are higher then this restricts output)
voice-dir - Where to store the clean samples before noise is added
noise-url - Which noise file to download and apply to your samples
output-folder - The final output location of the noised samples
num-copies - How many different noisy versions of each sample to create
max-noise-level - in Db,

# Out length minimum (default: 1s)
out_length = 1
# Maximum number of keywords to generate
count = 30
# Raw sample output directory
voice_dir = 'out-wav'
# Creative commons background noise from freesound.org:https://freesound.org/people/Astounded/sounds/483561/
noise_url = 'https://cdn.freesound.org/previews/483/483561_10201334-lq.ogg'
output_folder = 'out-noisy'
num_copies = 2  # Number of noisy copies to create for each input sample
max_noise_level = -5  # Maximum noise level to add in dBFS (negative value)

Then we need to check all the output folders are ready


# Check if output directory for noisey files exists and create it if it doesn't
if not os.path.exists(output_folder):
    os.makedirs(output_folder)
# Check if output directory for raw voices exists and create it if it doesn't
if not os.path.exists(voice_dir):
    os.makedirs(voice_dir)

And download the background noise file


# Download background noise file
response = requests.get(noise_url)
response.raise_for_status()
noise_audio = AudioSegment.from_file(io.BytesIO(response.content), format='ogg')

Then we can generate a list of all possible parameter combinations based on the input earlier. If you have set num_copies to be smaller than the number of combinations then these options will be reduced:


# Generate all combinations of parameters
all_opts = []
for p in pitches:
    for g in genders:
        for l in languages:
            for s in speakingRates:
                for kw in keyword:
                    all_opts.append({
                            "pitch": p,
                            "gender": g,
                            "language": l,
                            "speakingRate": s,
                            "text": kw['string'],
                            "label": kw['label']
                        })
if len(all_opts)*num_copies > count:
    selectEvery = len(all_opts)*num_copies // count
    selectNext = 0
    all_opts = all_opts[::selectEvery]
print(f'Generating {len(all_opts)*num_copies} samples')

Finally we iterate though all the options generated, call the Google TTS API to generate the desired sample, and apply noise to it, saving locally with metadata:


# Instantiate list for file label information
downloaded_files = []

# Instantiates a client
client = texttospeech.TextToSpeechClient()

ix = 0
for o in all_opts:
    ix += 1
    # Set the text input to be synthesized
    synthesis_input = texttospeech.SynthesisInput(text=o['text'])
    # Build the voice request
    voice = texttospeech.VoiceSelectionParams(
        language_code=o['language'],
        ssml_gender=o['gender']
    )
    # Select the type of audio file you want returned
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.LINEAR16,
        pitch=o['pitch'],
        speaking_rate=o['speakingRate'],
        sample_rate_hertz=16000
    )
    # Perform the text-to-speech request on the text input with the selected
    # voice parameters and audio file type

    wav_file_name = f"{voice_dir}/{o['label']}.{o['language']}-{o['gender']}-{o['pitch']}-{o['speakingRate']}.tts.wav"

    if not os.path.exists(wav_file_name):
        print(f"[{ix}/{len(all_opts)}] Text-to-speeching...")
        response = client.synthesize_speech(
            input=synthesis_input, voice=voice, audio_config=audio_config
        )
        with open(wav_file_name, "wb") as f:
            f.write(response.audio_content)
        has_hit_api = True
    else:
        print(f'skipping {wav_file_name}')
        has_hit_api = False

    # Load voice sample
    voice_audio = AudioSegment.from_file(wav_file_name)
    # Add silence to match output length with random padding
    difference = (out_length * 1000) - len(voice_audio)
    if difference > 0:
        padding_before = random.randint(0, difference)
        padding_after = difference - padding_before
        voice_audio = AudioSegment.silent(duration=padding_before) +  voice_audio + AudioSegment.silent(duration=padding_after)

    for i in range(num_copies):
        # Save noisy sample to output folder
        output_filename = f"{o['label']}.{o['language']}-{o['gender']}-{o['pitch']}-{o['speakingRate']}_noisy_{i+1}.wav"
        output_path = os.path.join(output_folder, output_filename)
        if not os.path.exists(output_path):
            # Select random section of noise and random noise level
            start_time = random.randint(0, len(noise_audio) - len(voice_audio))
            end_time = start_time +len(voice_audio)
            noise_level = random.uniform(max_noise_level, 0)

            # Extract selected section of noise and adjust volume
            noise_segment = noise_audio[start_time:end_time]
            noise_segment = noise_segment - abs(noise_level)

            # Mix voice sample with noise segment
            mixed_audio = voice_audio.overlay(noise_segment)
            # Save mixed audio to file
            mixed_audio.export(output_path, format='wav')

            print(f'Saved mixed audio to {output_path}')
        else:
            print(f'skipping {output_path}')
        # Save metadata for file
        downloaded_files.append({
            "path": str(output_filename),
            "label": o['label'],
            "category": "split",
            "metadata": {
                "pitch": str(['pitch']),
                "gender": str(o['gender']),
                "language": str(o['language']),
                "speakingRate": str(o['speakingRate']),
                "text": o['text'],
                "imported_from": "Google Cloud TTS"
            }
        })

    if has_hit_api:
        time.sleep(0.5)

print("Done text-to-speeching")
print("")

input_file = os.path.join(output_folder, 'input.json')
info_file = {
    "version": 1,
    "files": downloaded_files
}
# Output the metadata file
with open(input_file, "w") as f:
    json.dump(info_file, f)

# Move to the out-noisy folder
! cd out-noisy
# Upload all files in the out-noisy folder with metadata attached in the input.json file
! edge-impulse-uploader --info-file input.json *

What next?

Now you can use your keywords to create a robust keyword detection model in Edge Impulse Studio!

Try out both classification models and the transfer learning keyword spotting model to see which works best for your case

Generate physics simulation datasets

This notebook takes you through a basic example of using the physics simulation tool PyBullet to generate an accelerometer dataset representing dropping the Nordic Thingy:53 devkit from different heights. This dataset can be used to train a regression model to predict drop height.

This idea could be used for a wide range of simulatable environments- for example generating accelerometer datasets for pose estimation or fall detection. The same concept could be applied in an FMEA application for generating strain datasets for structural monitoring.

There is also a video version of this tutorial:

Local Software Requirements

Python 3
Pip package manager
Jupyter Notebook: https://jupyter.org/install
Bullet3: https://github.com/bulletphysics/bullet3

The dependencies can be installed with:

! pip install pybullet numpy

# Imports
import pybullet as p
import pybullet_data
import os
import shutil
import csv
import random
import numpy as np
import json

Create object to simulate

We need to load in a Universal Robotics Description Format file describing an object with the dimensions and weight of a Nordic Thingy:53. In this case, measuring our device it is 64x60x23.5mm and its weight 60g. The shape is given by a .obj 3D model file.

	<visual>
		<origin xyz="0.02977180615936878 -0.01182944632717566 0.03176079914341195" rpy="1.57079632679 0.0 0.0" />
		<geometry>
			<mesh filename="thingy53/thingy53 v2.obj" scale="1 1 1" />
		</geometry>
		<material name="texture">
			<color rgba="1.0 1.0 1.0 1.0" />
		</material>
	</visual>
	<collision>
		<origin xyz="0.02977180615936878 -0.01182944632717566 0.03176079914341195" rpy="1.57079632679 0.0 0.0" />
		<geometry>
			<mesh filename="thingy53/thingy53 v2.obj" scale="1 1 1" />
		</geometry>
	</collision>
	<inertial>
		<mass value="0.06" />
  		<inertia ixx="0.00002076125" ixy="0" ixz="0" iyy="0.00002324125" iyz="0" izz="0.00003848"/>
	</inertial>
</link>

Visualizing the problem

To generate the required data we will be running PyBullet in headless "DIRECT" mode so we can iterate quickly over the parameter field. If you run the python file below you can see how pybullet simulates the object dropping onto a plane

! python ../.assets/pybullet/single_simulation.py

Setting up the simulation environment

First off we need to set up a pybullet physics simulation environment. We load in our object file and a plane for it to drop onto. The plane's dynamics can be adjusted to better represent the real world (in this case we're dropping onto carpet)

# Set up PyBullet physics simulation (change from p.GUI to p.DIRECT for headless simulation)
physicsClient = p.connect(p.DIRECT)
p.setAdditionalSearchPath(pybullet_data.getDataPath())
p.setGravity(0, 0, -9.81)

# Load object URDF file
obj_file = "../.assets/pybullet/thingy53/thingy53.urdf"
obj_id = p.loadURDF(obj_file, flags=p.URDF_USE_INERTIA_FROM_FILE)

# Add a solid plane for the object to collide with
plane_id = p.loadURDF("plane.urdf")

# Set length of simulation and sampling frequency
sample_length = 2 # Seconds
sample_freq = 100 # Hz

We also need to define the output folder for our simulated accelerometer files

output_folder = 'output/'
# Check if output directory for noisey files exists and create it if it doesn't
if not os.path.exists(output_folder):
    os.makedirs(output_folder)
else:
    shutil.rmtree(output_folder)
    os.makedirs(output_folder)

And define the drop parameters

# Simulate dropping object from range of heights
heights = 100
sims_per_height = 20
min_height = 0.1 # Metres
max_height = 0.8 # Metres

We also need to define the characteristics of the IMU on the real device we are trying to simulate. In this case the Nordic Thingy:53 has a Bosch BMI270 IMU (https://www.bosch-sensortec.com/products/motion-sensors/imus/bmi270/) which is set to a range of +-2g with a resolution of 0.06g. These parameters will be used to restrict the raw acceleration output:


range_g = 2
range_acc = range_g * 9.81
resolution_mg = 0.06
resolution_acc = resolution_mg / 1000.0 * 9.81

Finally we are going to give the object and plane restitution properties to allow for some bounce. In this case I dropped the real Thingy:53 onto a hardwood table. You can use p.changeDynamics to introduce other factors such as damping and friction.

p.changeDynamics(obj_id, -1, restitution=0.3)
p.changeDynamics(plane_id, -1, restitution=0.4)

Drop simulation

Here we iterate over a range of heights, randomly changing its start orientation for i number of simulations per height. The acceleration is calculated relative to the orientation of the Thingy:53 object to represent its onboard accelerometer.

metadata = []
for height in np.linspace(max_height, min_height, num=heights):
    print(f"Simulating {sims_per_height} drops from {height}m")
    for i in range(sims_per_height):
        # Set initial position and orientation of object
        x = 0
        y = 0
        z = height
        orientation = p.getQuaternionFromEuler((random.uniform(0, 2 * np.pi), random.uniform(0, 2 * np.pi), random.uniform(0, 2 * np.pi)))
        p.resetBasePositionAndOrientation(obj_id, [x, y, z], orientation)
        
        prev_linear_vel = np.zeros(3)

        # Initialize the object position and velocity
        pos_prev, orn_prev = p.getBasePositionAndOrientation(obj_id)
        vel_prev, ang_vel_prev = p.getBaseVelocity(obj_id)
        timestamp=0
        dt=1/sample_freq
        p.setTimeStep(dt)
        filename=f"drop_{height}m_{i}.csv"
        with open(f"output/{filename}", mode="w") as csv_file:
                writer = csv.writer(csv_file)
                writer.writerow(['timestamp','accX','accY','accZ'])
        while timestamp < sample_length:
            p.stepSimulation()
            linear_vel, angular_vel = p.getBaseVelocity(obj_id)
            lin_acc = [(v - prev_v)/dt for v, prev_v in zip(linear_vel, prev_linear_vel)]
            prev_linear_vel = linear_vel
            timestamp += dt
            # Get the current position and orientation of the object
            pos, orn = p.getBasePositionAndOrientation(obj_id)

            # Get the linear and angular velocity of the object in world coordinates
            vel, ang_vel = p.getBaseVelocity(obj_id)

             # Calculate the change in position and velocity between steps
            pos_diff = np.array(pos) - np.array(pos_prev)
            vel_diff = np.array(vel) - np.array(vel_prev)

            # Convert the orientation quaternion to a rotation matrix
            rot_matrix = np.array(p.getMatrixFromQuaternion(orn)).reshape(3, 3)

            # Calculate the local linear acceleration of the object, subtracting gravity
            local_acc = np.dot(rot_matrix.T, vel_diff / dt) - np.array([0, 0, -9.81])
            # Restrict the acceleration to the range of the accelerometer
            imu_rel_lin_acc_scaled = np.clip(local_acc, -range_acc, range_acc)
            # Round the acceleration to the nearest resolution of the accelerometer
            imu_rel_lin_acc_rounded = np.round(imu_rel_lin_acc_scaled/resolution_acc) * resolution_acc
            # Update the previous position and velocity
            pos_prev, orn_prev = pos, orn
            vel_prev, ang_vel_prev = vel, ang_vel

            # Save acceleration data to CSV file
            with open(f"{output_folder}{filename}", mode="a") as csv_file:
                writer = csv.writer(csv_file)
                writer.writerow([timestamp*1000] + imu_rel_lin_acc_rounded.tolist())

        nearestheight = round(height, 2)
        metadata.append({
            "path": filename,
            "category": "training",
            "label": { "type": "label", "label": str(nearestheight)}
        })

Finally we save the metadata file to the output folder. This can be used to tell the edge-impulse-uploader CLI tool the floating point labels for each file.

jsonout = {"version": 1, "files": metadata}

with open(f"{output_folder}/files.json", "w") as f:
    json.dump(jsonout, f)

# Disconnect from PyBullet physics simulation
p.disconnect()

These files can then be uploaded to a project with these commands (run in a separate terminal window):

! cd output
! edge-impulse-uploader --info-file files.json

(run edge-impulse-uploader --clean if you have used the CLI before to reset the target project)

What next?

Now you can use your dataset a drop height detection regression model in Edge Impulse Studio!

See if you can edit this project to simulate throwing the object up in the air to predict the maximum height, or add in your own custom object. You could also try to better model the real environment you're dropping the object in- adding air resistance, friction, damping and material properties for your surface.

Generate audio datasets using Eleven Labs

Generate audio data using the . This integration allows you to generate realistic sound effects for your projects, such as glass breaking, car engine revving, or other custom sounds. You can customize the sound prompts and generate high-quality audio samples for your datasets.

This integration allows you to expand your datasets with sounds that may be difficult or expensive to record naturally. This approach not only saves time and money but also enhances the accuracy and reliability of the models we deploy on edge devices.

Example: Glass Breaking Sound

In this tutorial, we focus on a practical application that can be used in a smart security system, or in a factory to detect incidents, such as detecting the sounds of glass breaking.

There is also a video version of this guide:

Getting Started

Only available with Edge Impulse Pro Plan and Enterprise Plan

Navigate to Data Acquisition: Once you're in your project, navigate to the Data Acquisition section, go to Synthetic data and select the ElevenLabs Synthetic Audio Generator data source.

Step 1: Get your ElevenLabs API Key

First, get your Eleven Labs API Key. Navigate to the Eleven Labs web interface to get your key and optionally test your prompt.

Step 2: Parameters

Here we will be trying to collect a glass-breaking sound or impact.

Prompt: "glass breaking"
Simple prompts are just that: they are simple, one-sided prompts where we try to get the AI to generate a single sound effect. This could be, for example, “person walking on grass” or “glass breaking.” These types of prompts will generate a single type of sound effect with a few variations either in the same generation or in subsequent generations. All in all, they are fairly simple.
There are a few ways to improve these prompts, however, and that is by adding a little bit more detail. Even if they are simple prompts, they can be made to give better output by improving the prompt itself. For example, something that sometimes works is adding details like “high-quality, professionally recorded footsteps on grass, sound effects foley.” It can require some experimentation to find a good balance between being descriptive and keeping it brief enough to have AI understand the prompt. e.g. high quality audio of window glass breaking
Label: The label of the generated audio sample.
Prompt influence: Between 0 and 1, this setting ranges from giving the AI more creativity in how it interprets the prompt to telling the AI to be more strict in following the exact prompt that you’ve given. 1 being more creative.
Number of samples: Number of samples to generate
Minimum length (seconds): Minimum length of generated audio samples. Audio samples will be padded with silence to minimum length. It also determines how long your generations should be. Depending on what you set this as, you can get quite different results. For example, if I write “kick drum” and set the length to 11 seconds, I might get a full drum loop with a kick drum in it, but that might not be what I want. On the other hand, if I set the length to 1 second, I might just get a one-shot with a single instance of a kick drum.
Frequency (Hz): Audio frequency, ElevenLabs generates data at 44100Hz, so any other value will be resampled.
Upload to category: Data will be uploaded to this category in your project.

Step 3: Generate samples

Once you've set up your prompt, and api key, run the pipeline to generate the sound samples. You can then view the output in the Data Acquisition section.

Benefits of Using Generative AI for Sound Generation

Enhance Data Quality: Generative AI can create high-quality sound samples that are difficult to record naturally.
Increase Dataset Diversity: Access a wide range of sounds to enrich your training dataset and improve model performance.
Save Time and Resources: Quickly generate the sound samples you need without the hassle of manual recording.
Improve Model Accuracy: High-quality, diverse sound samples can help fill gaps in your dataset and enhance model performance.

Conclusion

By leveraging generative AI for sound generation, you can enhance the quality and diversity of your training datasets, leading to more accurate and reliable edge AI models. This innovative approach saves time and resources while improving the performance of your models in real-world applications. Try out the Eleven Labs block in Edge Impulse today and start creating high-quality sound datasets for your projects.

Using the Edge Impulse Python SDK to run EON Tuner

WARNING: This notebook will add and delete data in your Edge Impulse project, so be careful! We recommend creating a throwaway project when testing this notebook.

To start, create a new project in Edge Impulse. Do not add any data to it.

# If you have not done so already, install the following dependencies
# !python -m pip install matplotlib pandas edgeimpulse

import edgeimpulse as ei
from edgeimpulse.experimental.data import (
    upload_directory
)
from edgeimpulse.experimental.tuner import (
    check_tuner,
    set_impulse_from_trial,
    start_tuner,
    start_custom_tuner,
    tuner_report_as_df,
)
from edgeimpulse.experimental.impulse import (
    build,
)

Note that you do not actually need to use the project in the Edge Impulse Studio. We just need the API Key.

Paste that API key string in the ei.API_KEY value in the following cell:

# Settings
ei.API_KEY = "ei_dae2..." # Change this to your Edge Impulse API key
deploy_filename = "my_model_cpp.zip"

# Get the project ID
api = ei.experimental.api.EdgeImpulseApi()
project_id = api.default_project_id()

Upload dataset

We start by downloading the and uploading it to our project.

# Download and unzip gesture dataset
!mkdir -p dataset/
!wget -P dataset -q https://cdn.edgeimpulse.com/datasets/gestures.zip
!unzip -q dataset/gestures.zip -d dataset/gestures/

# Upload training dataset
resp = upload_directory(
    directory="dataset/gestures/training",
    category="training",
)
print(f"Uploaded {len(resp.successes)} training samples")

# Upload test dataset
resp = upload_directory(
    directory="dataset/gestures/testing",
    category="testing",
)
print(f"Uploaded {len(resp.successes)} testing samples")

# Uncomment the following if you want to delete the temporary dataset folder
#!rm -rf dataset/

Run the Tuner

To start, we need to list the possible target devices we can use for profiling. We need to pick from this list.

# List the available profile targets
ei.model.list_profile_devices()

You should see a list printed such as:

['alif-he',
 'alif-hp',
 'arduino-nano-33-ble',
 'arduino-nicla-vision',
 'portenta-h7',
 'brainchip-akd1000',
 'cortex-m4f-80mhz',
 'cortex-m7-216mhz',
 ...
 'ti-tda4vm']

NOTE: We set the max trials to 3 here. In a real life situation, you will omit this so the tuner decides the best number of trials.

Once the tuner is done, you can print out the results to determine the best combination of blocks and hyperparameters.

# Choose a device from the list
target_device = "cortex-m4f-80mhz"

# Start tuner. This will take 15+ minutes.
start_tuner(
    target_device=target_device,
    classification_type="classification",
    dataset_category="motion_continuous",
    target_latency=100,
    tuning_max_trials=3,
)

# Wait while checking the tuner's progress.
state = check_tuner(
    wait_for_completion=True
)

Print EON Tuner results

To visualize the results of the tuner trials, you can head to the project page on Edge Impulse Studio.

From there, you will want to sort the results based on some metric. In this example, we will sort based on int8 test set accuracy from highest to lowest.

Note: Edge Impulse supports only one learning block per project at this time (excluding anomaly detection blocks). As a result, we will use the first learning block (e.g. learning_blocks[0]) in the list to extract metrics.

import json

# The easiest way to view the results is to look at the EON Tuner page on your project
print(f"Navigate to https://studio.edgeimpulse.com/studio/{project_id}/tuner to see the results")

# Set quantization ("float32" or "int8")
qtzn = "int8"

# Filter out all failed trials
results = [r for r in state.trials if r.status == "completed"]

# Extract int8 accuracies from the trial results
accuracies = []
for result in results:
    accuracy = result.impulse.learn_blocks[0]["metrics"]["test"][qtzn]["accuracy"]
    accuracies.append(accuracy)

# Sort the results based on int8 accuracies
acc_results = zip(accuracies, results)
sorted_results = sorted(acc_results, reverse=True, key=lambda x: list(x)[0])
sorted_results = [result for _, result in sorted_results]

Note: we assume the first learning block has the metrics we care about.

def get_metrics(results, qtzn, idx):
    """Calculate metrics for a given trial index"""

    metrics = {}

    # Get model accuracy results
    result_metrics = results[idx].impulse.learn_blocks[0]["metrics"]
    metrics["val-acc"] = result_metrics['validation'][qtzn]['accuracy']
    metrics["test-acc"] = result_metrics['test'][qtzn]['accuracy']
    
    # Calculate processing block RAM
    metrics["processing-block-ram"] = 0
    for i, dsp_block in enumerate(results[idx].impulse.dsp_blocks):
        metrics["processing-block-ram"] += dsp_block["performance"]["ram"]

    # Get latency, RAM, and ROM usage
    device_performance = results[idx].device_performance[qtzn]
    metrics["learning-block-latency-ms"] = device_performance['latency']
    metrics["learning-block-tflite-ram"] = device_performance['tflite']['ramRequired']
    metrics["learning-block-tflite-rom"] = device_performance['tflite']['romRequired']
    metrics["learning-block-eon-ram"] = device_performance['eon']['ramRequired']
    metrics["learning-block-eon-rom"] = device_performance['eon']['romRequired']

    return metrics

# The top performing impulse is the first element (sorted by highest int8 accuracy on test set)
trial_idx = 0

# Print info about the processing (DSP) blocks and store RAM usage
print("Processing blocks")
print("===")
for i, dsp_block in enumerate(sorted_results[trial_idx].impulse.dsp_blocks):
    print(f"Processing block {i}")
    print("---")
    print("Block:")
    print(json.dumps(dsp_block["block"], indent=2))
    print("Config:")
    print(json.dumps(dsp_block["config"], indent=2))
print()

# Print info about the learning blocks
print("Learning blocks")
print("===")
for i, learn_block in enumerate(sorted_results[trial_idx].impulse.learn_blocks):
    print(f"Learn block {i}")
    print("---")
    print("Block:")
    print(json.dumps(learn_block["block"], indent=2))
    print("Config:")
    print(json.dumps(learn_block["config"], indent=2))
    metadata = learn_block["metadata"]
    qtzn_metadata = [m for m in metadata["modelValidationMetrics"] if m.get("type") == qtzn]
print()

# Print metrics
metrics = get_metrics(sorted_results, qtzn, trial_idx)
print(f"Metrics ({qtzn}) for best trial")
print("===")
print(f"Validation accuracy: {metrics['val-acc']}")
print(f"Test accuracy: {metrics['test-acc']}")
print(f"Estimated processing blocks RAM (bytes): {metrics['processing-block-ram']}")
print(f"Estimated learning blocks latency (ms): {metrics['learning-block-latency-ms']}")
print(f"Estimated learning blocks RAM (bytes): {metrics['learning-block-tflite-ram']}")
print(f"Estimated learning blocks ROM (bytes): {metrics['learning-block-tflite-rom']}")
print(f"Estimated learning blocks RAM with EON Compiler (bytes): {metrics['learning-block-eon-ram']}")
print(f"Estimated learning blocks ROM with EON Compiler (bytes): {metrics['learning-block-eon-rom']}")

Graph results

You can optionally use a plotting package like matplotlib to graph the results from the top results to compare the metrics.

import matplotlib.pyplot as plt

# Get metrics for the top 3 trials (sorted by int8 test set accuracy)
num_trials = 3
top_metrics = [get_metrics(sorted_results, qtzn, idx) for idx in range(num_trials)]

# Construct metrics for plotting
test_accs = [top_metrics[x]['test-acc'] for x in range(num_trials)]
proc_rams = [top_metrics[x]['processing-block-ram'] for x in range(num_trials)]
learn_latencies = [top_metrics[x]['learning-block-latency-ms'] for x in range(num_trials)]
learn_tflite_rams = [top_metrics[x]['learning-block-tflite-ram'] for x in range(num_trials)]
learn_tflite_roms = [top_metrics[x]['learning-block-tflite-rom'] for x in range(num_trials)]
learn_eon_rams = [top_metrics[x]['learning-block-eon-ram'] for x in range(num_trials)]
learn_eon_roms = [top_metrics[x]['learning-block-eon-rom'] for x in range(num_trials)]

# Create plots
fig, axs = plt.subplots(7, 1, figsize=(8, 15))
indices = range(num_trials)

# Plot test accuracies
axs[0].barh(indices, test_accs)
axs[0].set_title("Test set accuracy")
axs[0].set_xlabel("Accuracy")
axs[0].set_ylabel("Trial")

# Plot processing block RAM
axs[1].barh(indices, proc_rams)
axs[1].set_title("Processing block RAM")
axs[1].set_xlabel("RAM (bytes)")
axs[1].set_ylabel("Trial")

# Plot learning block latency
axs[2].barh(indices, learn_latencies)
axs[2].set_title("Learning block latency")
axs[2].set_xlabel("Latency (ms)")
axs[2].set_ylabel("Trial")

# Plot learning block RAM (TFLite)
axs[3].barh(indices, learn_tflite_rams)
axs[3].set_title("Learning block RAM (TFLite)")
axs[3].set_xlabel("RAM (bytes)")
axs[3].set_ylabel("Trial")

# Plot learning block ROM (TFLite)
axs[4].barh(indices, learn_tflite_roms)
axs[4].set_title("Learning block ROM (TFLite)")
axs[4].set_xlabel("ROM (bytes)")
axs[4].set_ylabel("Trial")

# Plot learning block RAM (EON)
axs[5].barh(indices, learn_eon_rams)
axs[5].set_title("Learning block RAM (EON)")
axs[5].set_xlabel("RAM (bytes)")
axs[5].set_ylabel("Trial")

# Plot learning block ROM (EON)
axs[6].barh(indices, learn_eon_roms)
axs[6].set_title("Learning block ROM (EON)")
axs[6].set_xlabel("ROM (bytes)")
axs[6].set_ylabel("Trial")

# Prevent overlap
plt.tight_layout()

Results as a DataFrame

If you have installed, you can make the previous section much easier by reporting metrics as a DataFrame.

import pandas as pd

# Convert the state metrics into a DataFrame
df = tuner_report_as_df(state)
df.head()

# Print column names
for col in df.columns:
    print(col)

# Sort the DataFrame by validation (int8) accuracy
df = df.sort_values(by="test_int8_accuracy", ascending=False)

# Print out best trial metrics
print(f"Trial ID: {df.iloc[0]['id']}")
print(f"Test accuracy (int8): {df.iloc[0]['test_int8_accuracy']}")
print(f"Estimated learning blocks latency (ms): {df.iloc[0]['device_performance_int8_latency']}")
print(f"Estimated learning blocks RAM (bytes): {df.iloc[0]['device_performance_int8_tflite_ram_required']}")
print(f"Estimated learning blocks ROM (bytes): {df.iloc[0]['device_performance_int8_tflite_rom_required']}")
print(f"Estimated learning blocks RAM with EON Compiler (bytes): {df.iloc[0]['device_performance_int8_eon_ram_required']}")
print(f"Estimated learning blocks ROM with EON Compiler (bytes): {df.iloc[0]['device_performance_int8_eon_rom_required']}")

Set trial as impulse and deploy

We can replace the current impulse with the top performing trial from the EON Tuner. From there, we can deploy it, just like we would any impulse.

# Get the ID for the top-performing trial and set that to our impulse. This will take about a minute.
trial_id = df.iloc[0].trial_id
response = set_impulse_from_trial(trial_id)
job_id = response.id

# Make sure the impulse update was successful
if not hasattr(response, "success") or getattr(response, "success") == False:
    raise RuntimeError("Could not set project impulse to trial impulse")

# List the available profile target devices
ei.model.list_deployment_targets()

You should see a list printed such as:

['zip',
 'arduino',
 'cubemx',
 'wasm',
 ...
 'runner-linux-aarch64-jetson-orin-6-0']

Note that instead of writing the raw bytes to a file, you can also specify an output_directory argument in the .deploy() function. Your deployment file(s) will be downloaded to that directory.

# Build and download C++ library with the trained model
deploy_bytes = None
try:
    deploy_bytes = build(
        deploy_model_type=qtzn,
        engine="tflite",
        deploy_target="zip"
    )
except Exception as e:
    print(f"Could not deploy: {e}")
    
# Write the downloaded raw bytes to a file
if deploy_bytes:
    with open(deploy_filename, 'wb') as f:
        f.write(deploy_bytes.getvalue())

Configure custom search space

The best way to define a search space is to open your project (after uploading data), head to the EON Tuner page, click Run EON Tuner, and select the Space tab.

The search space is defined in JSON format, so we can just copy that to create a dictionary. This is a good place to start for tuning blocks and hyperparameters.

Note: Functions to get available blocks and search space parameters coming soon

from edgeimpulse_api import (
    OptimizeConfig,
    TunerSpaceImpulse,
)

# Configure the search space
space = {
    "inputBlocks": [
      {
        "type": "time-series",
        "window": [
          {"windowSizeMs": 9000, "windowIncreaseMs": 9000},
          {"windowSizeMs": 10000, "windowIncreaseMs": 10000}
        ],
        "frequencyHz": [62.5],
        "padZeros": [True]
      }
    ],
    "dspBlocks": [
      {
        "type": "spectral-analysis",
        "analysis-type": ["FFT"],
        "fft-length": [16, 64],
        "scale-axes": [1],
        "filter-type": ["none"],
        "filter-cutoff": [3],
        "filter-order": [6],
        "do-log": [True],
        "do-fft-overlap": [True]
      },
      {
        "type": "spectral-analysis",
        "analysis-type": ["Wavelet"],
        "wavelet": ["haar", "bior1.3"],
        "wavelet-level": [1, 2]
      },
      {"type": "raw", "scale-axes": [1]}
    ],
    "learnBlocks": [
      {
        "id": 4,
        "type": "keras",
        "dimension": ["dense"],
        "denseBaseNeurons": [40, 20],
        "denseLayers": [2, 3],
        "dropout": [0.25, 0.5],
        "learningRate": [0.0005],
        "trainingCycles": [30]
      }
    ]
  }

# Wrap the search space
ts = TunerSpaceImpulse.from_dict(space)

# Create a custom configuration
config = OptimizeConfig(
    name=None,
    target_device={"name": "cortex-m4f-80mhz"},
    classification_type="classification",
    dataset_category="motion_continuous",
    target_latency=100,
    tuning_max_trials=2,
    space=[ts]
)

# Start tuner and wait for it to complete
start_custom_tuner(
    config=config
)
state = check_tuner(
    wait_for_completion=True
)

# The easiest way to view the results is to look at the EON Tuner page on your project
print(f"Navigate to https://studio.edgeimpulse.com/studio/{project_id}/tuner to see the results")

# Set quantization ("float32" or "int8")
qtzn = "int8"

# Filter out all failed trials
results = [r for r in state.trials if r.status == "completed"]

# Extract float32 accuracies from the trial results
accuracies = []
for result in results:
    accuracy = result.impulse.learn_blocks[0]["metrics"]["test"][qtzn]["accuracy"]
    accuracies.append(accuracy)

# Sort the results based on int8 accuracies
acc_results = zip(accuracies, results)
sorted_results = sorted(acc_results, reverse=True, key=lambda x: list(x)[0])
sorted_results = [result for _, result in sorted_results]

# The top performing impulse is the first element (sorted by highest int8 accuracy on test set)
trial_idx = 0

# Print info about the processing (DSP) blocks and store RAM usage
print("Processing blocks")
print("===")
for i, dsp_block in enumerate(sorted_results[trial_idx].impulse.dsp_blocks):
    print(f"Processing block {i}")
    print("---")
    print("Block:")
    print(json.dumps(dsp_block["block"], indent=2))
    print("Config:")
    print(json.dumps(dsp_block["config"], indent=2))
print()

# Print info about the learning blocks
print("Learning blocks")
print("===")
for i, learn_block in enumerate(sorted_results[trial_idx].impulse.learn_blocks):
    print(f"Learn block {i}")
    print("---")
    print("Block:")
    print(json.dumps(learn_block["block"], indent=2))
    print("Config:")
    print(json.dumps(learn_block["config"], indent=2))
    metadata = learn_block["metadata"]
    qtzn_metadata = [m for m in metadata["modelValidationMetrics"] if m.get("type") == qtzn]
print()

# Print metrics
metrics = get_metrics(sorted_results, qtzn, trial_idx)
print(f"Metrics ({qtzn}) for best trial")
print("===")
print(f"Validation accuracy: {metrics['val-acc']}")
print(f"Test accuracy: {metrics['test-acc']}")
print(f"Estimated processing blocks RAM (bytes): {metrics['processing-block-ram']}")
print(f"Estimated learning blocks latency (ms): {metrics['learning-block-latency-ms']}")
print(f"Estimated learning blocks RAM (bytes): {metrics['learning-block-tflite-ram']}")
print(f"Estimated learning blocks ROM (bytes): {metrics['learning-block-tflite-rom']}")
print(f"Estimated learning blocks RAM with EON Compiler (bytes): {metrics['learning-block-eon-ram']}")
print(f"Estimated learning blocks ROM with EON Compiler (bytes): {metrics['learning-block-eon-rom']}")

Using the Edge Impulse Python SDK to upload and download data

You can find the API documentation for the functions found in this tutorial .

WARNING: This notebook will add and delete data in your Edge Impulse project, so be careful! We recommend creating a throwaway project when testing this notebook.

Note that you might need to refresh the page with your Edge Impulse project to see the samples appear.

# If you have not done so already, install the following dependencies
!python -m pip install edgeimpulse

import edgeimpulse as ei

Note that you do not actually need to use the project in the Edge Impulse Studio. We just need the API Key.

Paste that API key string in the ei.API_KEY value in the following cell:

# Settings
ei.API_KEY = "ei_dae2..." # Change this to your Edge Impulse API key

Upload directory

You can upload all files in a directory using the Python SDK. Note that you can set the category, label, and metadata for all files with a single call. If you want to use a different label for each file set label=None in the function call and name your files with <label>.<name>.<ext>. For example, wave.01.csv will have the label wave when uploaded. See for more information.

The following file formats are allowed: .cbor, .json, .csv, .wav, .jpg, .png, .mp4, .avi.

from datetime import datetime

# Download image files to use as an example dataset
!mkdir -p dataset
!wget -P dataset -q \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/capacitor.01.png \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/capacitor.02.png

# Upload the entire directory
response = ei.experimental.data.upload_directory(
    directory="dataset",
    category="training",
    label=None, # Will use the prefix before the '.' on each filename for the label
    metadata={
        "date": datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
        "source": "camera",
    }
)

# Check to make sure there were no failures
assert len(response.fails) == 0, "Could not upload some files"

# Save the sample IDs, as we will need these to retrieve file information and delete samples
ids = []
for sample in response.successes:
    ids.append(sample.sample_id)

# Review the sample IDs and get the associated server-side filename
# Note the lack of extension! Multiple samples on the server can have the same filename.
for id in ids:
    filename = ei.experimental.data.get_filename_by_id(id)
    print(f"Sample ID: {id}, filename: {filename}")

If you head to the Data acquisition page on your project, you should see images in your dataset.

Download files

# Get sample IDs for everything in the "training" category
infos = ei.experimental.data.get_sample_ids(category="training")

# The SampleInfo should match what we uploaded earlier
ids = []
for info in infos:
    print(info)
    ids.append(info.sample_id)

# Download samples
samples = ei.experimental.data.download_samples_by_ids(ids)

# Save the downloaded files
for sample in samples:
    with open(sample.filename, "wb") as file:
        file.write(sample.data.read())

# View sample information
for sample in samples:
    print(
        f"filename: {sample.filename}\r\n"
        f"  sample ID: {sample.sample_id}\r\n"
        f"  category: {sample.category}\r\n"
        f"  label: {sample.label}\r\n"
        f"  bounding boxes: {sample.bounding_boxes}\r\n"
        f"  metadata: {sample.metadata}"
    )

Take a look at the files in this directory. You should see the downloaded images. They should match the images in the dataset/ directory, which were the original images that we uploaded.

Delete files

If you know the ID of the sample you would like to delete, you can call the delete_sample_by_id() function. You can also delete all the samples in your project by calling delete_all_samples().

# Delete the samples from the Edge Impulse project that we uploaded earlier
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

Take a look at the data in your project. The samples that we uploaded should be gone.

Upload folder for object detection

For object detection, you can put bounding box information (following the ) in a file named info.labels in that same directory.

Important! The annotations file must be named exactly info.labels

# Download images and bounding box annotations to use as an example dataset
!mkdir -p dataset
!rm dataset/capacitor.01.png dataset/capacitor.02.png
!wget -P dataset -q \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/dog-ball-toy.01.png \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/images/dog-ball-toy.02.png \
  https://raw.githubusercontent.com/edgeimpulse/notebooks/main/.assets/annotations/info.labels

# Upload the entire directory (including the info.labels file)
response = ei.experimental.data.upload_exported_dataset(
    directory="dataset",
)

# Check to make sure there were no failures
assert len(response.fails) == 0, "Could not upload some files"

# Save the sample IDs, as we will need these to retrieve file information and delete samples
ids = []
for sample in response.successes:
    ids.append(sample.sample_id)

If you head to the Data acquisition page on your project, you should see images in your dataset along with the bounding box information.

# Delete the samples from the Edge Impulse project that we uploaded
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

Upload individual CSV files

The Edge Impulse ingestion service accepts CSV files, which we can use to upload raw data. Note that if you configure a CSV template using the , then the expected format of the CSV file might change. If you do not configure a CSV template, then the ingestion service expects CSV data to be in a particular format. See .

import csv
import io
import os

# Create example CSV data
sample_data = [
    [
        ["timestamp", "accX", "accY", "accZ"],
        [0, -9.81, 0.03, 0.21],
        [10, -9.83, 0.04, 0.27],
        [20, -9.12, 0.03, 0.23],
        [30, -9.14, 0.01, 0.25],
    ],
    [
        ["timestamp", "accX", "accY", "accZ"],
        [0, -9.56, 5.34, 1.21],
        [10, -9.43, 1.37, 1.27],
        [20, -9.22, -4.03, 1.23],
        [30, -9.50, -0.98, 1.25],
    ],
]

# Write to CSV files
filenames = [
    "001.csv",
    "002.csv"
]
for i, filename in enumerate(filenames):
    file_path = os.path.join("dataset", filename)
    with open(file_path, "w", newline="") as file:
        writer = csv.writer(file)
        writer.writerows(sample_data[i])

# Add metadata to the CSV data
my_samples = [
    {
        "filename": filenames[0],
        "data": open(os.path.join("dataset", filenames[0]), "rb"),
        "category": "training",
        "label": "idle",
        "metadata": {
            "source": "accelerometer",
            "collection site": "desk",
        },
    },
    {
        "filename": filenames[1],
        "data": open(os.path.join("dataset", filenames[1]), "rb"),
        "category": "training",
        "label": "wave",
        "metadata": {
            "source": "accelerometer",
            "collection site": "desk",
        },
    },
]

# Wrap the samples in instances of the Sample class
samples = [ei.experimental.data.Sample(**i) for i in my_samples]

# Upload samples to your project
response = ei.experimental.data.upload_samples(samples)

# Check to make sure there were no failures
assert len(response.fails) == 0, "Could not upload some files"

# Save the sample IDs, as we will need these to retrieve file information and delete samples
ids = []
for sample in response.successes:
    ids.append(sample.sample_id)

If you head to the Data acquisition page on your project, you should see your time series data.

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

Upload JSON data directly

Another way to upload data is to encode it in JSON format. See the for more information on acceptable key/value pairs. Note that at this time, the signature value can be set to 0.

The raw data must be encoded in an IO object. We convert the dictionary objects to a BytesIO object, but you can also read in data from .json files.

import io
import json

# Create two different example data samples
sample_data_1 = {
    "protected": {
        "ver": "v1",
        "alg": "none",
    },
    "signature": 0,
    "payload": {
        "device_name": "ac:87:a3:0a:2d:1b",
        "device_type": "DISCO-L475VG-IOT01A",
        "interval_ms": 10,
        "sensors": [
            { "name": "accX", "units": "m/s2" },
            { "name": "accY", "units": "m/s2" },
            { "name": "accZ", "units": "m/s2" }
        ],
        "values": [
            [ -9.81, 0.03, 0.21 ],
            [ -9.83, 0.04, 0.27 ],
            [ -9.12, 0.03, 0.23 ],
            [ -9.14, 0.01, 0.25 ]
        ]
    }
}
sample_data_2 = {
    "protected": {
        "ver": "v1",
        "alg": "none",
    },
    "signature": 0,
    "payload": {
        "device_name": "ac:87:a3:0a:2d:1b",
        "device_type": "DISCO-L475VG-IOT01A",
        "interval_ms": 10,
        "sensors": [
            { "name": "accX", "units": "m/s2" },
            { "name": "accY", "units": "m/s2" },
            { "name": "accZ", "units": "m/s2" }
        ],
        "values": [
            [ -9.56, 5.34, 1.21 ],
            [ -9.43, 1.37, 1.27 ],
            [ -9.22, -4.03, 1.23 ],
            [ -9.50, -0.98, 1.25 ]
        ]
    }
}

# Provide a filename, category, label, and optional metadata for each sample
my_samples = [
    {
        "filename": "001.json",
        "data": io.BytesIO(json.dumps(sample_data_1).encode('utf-8')),
        "category": "training",
        "label": "idle",
        "metadata": {
            "source": "accelerometer",
            "collection site": "desk",
        },
    },
    {
        "filename": "002.json",
        "data": io.BytesIO(json.dumps(sample_data_2).encode('utf-8')),
        "category": "training",
        "label": "wave",
        "metadata": {
            "source": "accelerometer",
            "collection site": "desk",
        },
    },
]

# Wrap the samples in instances of the Sample class
samples = [ei.data.sample_type.Sample(**i) for i in my_samples]

# Upload samples to your project
response = ei.experimental.data.upload_samples(samples)

# Check to make sure there were no failures
assert len(response.fails) == 0, "Could not upload some files"

# Save the sample IDs, as we will need these to retrieve file information and delete samples
ids = []
for sample in response.successes:
    ids.append(sample.sample_id)

If you head to the Data acquisition page on your project, you should see your time series data.

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

Upload NumPy arrays

is powerful Python library for working with large arrays and matrices. You can upload NumPy arrays directly into your Edge Impulse project. Note that the arrays are required to be in a particular format, and must be uploaded with required metadata (such as a list of labels and the sample rate).

Important! NumPy arrays must be in the shape (Number of samples, number of data points, number of sensors)

If you are working with image data in NumPy, we recommend saving those images as .png or .jpg files and using upload_directory().

import numpy as np

# Create example NumPy array with 2 time series samples
sample_data = np.array(
    [
        [ # Sample 1 ("idle")
            [-9.81, 0.03, 0.21],
            [-9.83, 0.04, 0.27],
            [-9.12, 0.03, 0.23],
            [-9.14, 0.01, 0.25],
        ],
        [ # Sample 2 ("wave")
            [-9.56, 5.34, 1.21],
            [-9.43, 1.37, 1.27],
            [-9.22, -4.03, 1.23],
            [-9.50, -0.98, 1.25],
        ],
    ]
)

# Labels for each sample
labels = ["idle", "wave"]

# Names of the sensors and units for the 3 axes
sensors = [
    { "name": "accX", "units": "m/s2" },
    { "name": "accY", "units": "m/s2" },
    { "name": "accZ", "units": "m/s2" },
]

# Optional metadata for all samples being uploaded
metadata = {
    "source": "accelerometer",
    "collection site": "desk",
}

# Upload samples to your project
response = ei.experimental.data.upload_numpy(
    data=sample_data,
    labels=labels,
    sensors=sensors,
    sample_rate_ms=10,
    metadata=metadata,
    category="training",
)

# Check to make sure there were no failures
assert len(response.fails) == 0, "Could not upload some files"

# Save the sample IDs, as we will need these to retrieve file information and delete samples
ids = []
for sample in response.successes:
    ids.append(sample.sample_id)

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

Upload pandas (and pandas-like) dataframes

is popular Python library for performing data manipulation and analysis. The Edge Impulse library supports a number of ways to upload dataframes. We will go over each format.

Note that several other packages exist that work as drop-in replacements for pandas. You can use these replacements so long as you import that with the name pd. For example, one of:

import pandas as pd
import modin.pandas as pd
import dask.dataframe as pd
import polars as pd

import pandas as pd

The first option is to upload one dataframe for each sample (non-time series)

# Construct one dataframe for each sample (multidimensional, non-time series)
df_1 = pd.DataFrame([[-9.81, 0.03, 0.21]], columns=["accX", "accY", "accZ"])
df_2 = pd.DataFrame([[-9.56, 5.34, 1.21]], columns=["accX", "accY", "accZ"])

# Optional metadata for all samples being uploaded
metadata = {
    "source": "accelerometer",
    "collection site": "desk",
}

# Upload the first sample
ids = []
response = ei.experimental.data.upload_pandas_sample(
    df_1,
    label="One",
    filename="001",
    metadata=metadata,
    category="training",
)
assert len(response.fails) == 0, "Could not upload some files"
for sample in response.successes:
    ids.append(sample.sample_id)

# Upload the second sample
response = ei.experimental.data.upload_pandas_sample(
    df_2,
    label="Two",
    filename="002",
    metadata=metadata,
    category="training",
)
assert len(response.fails) == 0, "Could not upload some files"
for sample in response.successes:
    ids.append(sample.sample_id)

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

You can also upload one dataframe for each sample (time series). As with previous examples, we'll assume that the sample rate is 10 ms.

# Create samples (multidimensional, time series)
sample_data_1 = [ # Sample 1 ("idle")
    [-9.81, 0.03, 0.21],
    [-9.83, 0.04, 0.27],
    [-9.12, 0.03, 0.23],
    [-9.14, 0.01, 0.25],
]
sample_data_2 = [ # Sample 1 ("wave")
    [-9.56, 5.34, 1.21],
    [-9.43, 1.37, 1.27],
    [-9.22, -4.03, 1.23],
    [-9.50, -0.98, 1.25],
]

# Construct one dataframe for each sample
df_1 = pd.DataFrame(sample_data_1, columns=["accX", "accY", "accZ"])
df_2 = pd.DataFrame(sample_data_2, columns=["accX", "accY", "accZ"])

# Optional metadata for all samples being uploaded
metadata = {
    "source": "accelerometer",
    "collection site": "desk",
}

# Upload the first sample
ids = []
response = ei.experimental.data.upload_pandas_sample(
    df_1,
    label="Idle",
    filename="001",
    sample_rate_ms=10,
    metadata=metadata,
    category="training",
)
assert len(response.fails) == 0, "Could not upload some files"
for sample in response.successes:
    ids.append(sample.sample_id)

# Upload the second sample
response = ei.experimental.data.upload_pandas_sample(
    df_2,
    label="Wave",
    filename="002",
    sample_rate_ms=10,
    metadata=metadata,
    category="training",
)
assert len(response.fails) == 0, "Could not upload some files"
for sample in response.successes:
    ids.append(sample.sample_id)

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

You can upload non-time series data where each sample is a row in the dataframe. Note that you need to provide labels in the rows.

# Construct non-time series data, where each row is a different sample
data = [
    ["desk", "training", "One", -9.81, 0.03, 0.21],
    ["field", "training", "Two", -9.56, 5.34, 1.21],
]
columns = ["loc", "category", "label", "accX", "accY", "accZ"]

# Wrap the data in a DataFrame
df = pd.DataFrame(data, columns=columns)

# Upload non-time series DataFrame (with multiple samples) to the project
ids = []
response = ei.experimental.data.upload_pandas_dataframe(
    df,
    feature_cols=["accX", "accY", "accZ"],
    label_col="label",
    category_col="category",
    metadata_cols=["loc"],
)
assert len(response.fails) == 0, "Could not upload some files"
for sample in response.successes:
    ids.append(sample.sample_id)

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

A "wide" dataframe is one where each column represents a value in the time series data, and the rows become individual samples. Note that you need to provide labels in the rows.

# Construct time series data, where each row is a different sample
data = [
    ["desk", "training", "idle", 0.8, 0.7, 0.8, 0.9, 0.8, 0.8, 0.7, 0.8],
    ["field", "training", "motion", 0.3, 0.9, 0.4, 0.6, 0.8, 0.9, 0.5, 0.4],
]
columns = ["loc", "category", "label", "0", "1", "2", "3", "4", "5", "6", "7"]

# Wrap the data in a DataFrame
df = pd.DataFrame(data, columns=columns)

# Upload time series DataFrame (with multiple samples) to the project
ids = []
response = ei.experimental.data.upload_pandas_dataframe_wide(
    df,
    label_col="label",
    category_col="category",
    metadata_cols=["loc"],
    data_col_start=3,
    sample_rate_ms=100,
)
assert len(response.fails) == 0, "Could not upload some files"
for sample in response.successes:
    ids.append(sample.sample_id)

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

A DataFrame can also be divided into "groups" so you can upload multidimensional time series data.

# Create samples
sample_data = [
    ["desk", "sample 1", "training", "idle", 0, -9.81, 0.03, 0.21],
    ["desk", "sample 1", "training", "idle", 0.01, -9.83, 0.04, 0.27],
    ["desk", "sample 1", "training", "idle", 0.02, -9.12, 0.03, 0.23],
    ["desk", "sample 1", "training", "idle", 0.03, -9.14, 0.01, 0.25],
    ["field", "sample 2", "training", "wave", 0, -9.56, 5.34, 1.21],
    ["field", "sample 2", "training", "wave", 0.01, -9.43, 1.37, 1.27],
    ["field", "sample 2", "training", "wave", 0.02, -9.22, -4.03, 1.23],
    ["field", "sample 2", "training", "wave", 0.03, -9.50, -0.98, 1.25],
]
columns = ["loc", "sample_name", "category", "label", "timestamp", "accX", "accY", "accZ"]

# Wrap the data in a DataFrame
df = pd.DataFrame(sample_data, columns=columns)

# Upload time series DataFrame (with multiple samples and multiple dimensions) to the project
ids = []
response = ei.experimental.data.upload_pandas_dataframe_with_group(
    df,
    group_by="sample_name",
    timestamp_col="timestamp",
    feature_cols=["accX", "accY", "accZ"],
    label_col="label",
    category_col="category",
    metadata_cols=["loc"]
)
assert len(response.fails) == 0, "Could not upload some files"
for sample in response.successes:
    ids.append(sample.sample_id)

# Delete the samples from the Edge Impulse project
for id in ids:
    ei.experimental.data.delete_sample_by_id(id)

ML & data engineering

Edge Impulse Python SDK

Labeling

Generate synthetic datasets

Other tutorials

EI Python SDK

Using the Edge Impulse Python SDK with TensorFlow and Keras

Train a machine learning model

Profile your model

Deploy your model

Using the Edge Impulse Python SDK to run EON Tuner

Upload dataset

Run the Tuner

Print EON Tuner results

Graph results

Results as a DataFrame

Set trial as impulse and deploy

Configure custom search space

Using the Edge Impulse Python SDK with Hugging Face

Profile your model

Deploy your model

Using the Edge Impulse Python SDK with Weights & Biases

Gather a dataset

Create an experiment

Run the experiment

Deploy Your Model

Using the Edge Impulse Python SDK with SageMaker Studio

Transfer Learning

Dataset

Train

Retrieve and prepare the newly trained model

Edge Impulse

Using the Edge Impulse Python SDK to upload and download data

Upload directory

Download files

Delete files

Upload folder for object detection

Upload individual CSV files

Upload JSON data directly

Upload NumPy arrays

Upload pandas (and pandas-like) dataframes

Label image data using GPT-4o

How to use it

Step 1: Data Collection

Step 2: Add the labeling block

Step 4: Configure the labeling block

Optional: Editing your labeling block

Step 5: Execute

Step 6: Train your model

Step 7: Deployment

Results

Conclusion

Examples & Resources

Label audio data using your existing models

Generate synthetic datasets

Using integrated models directly available inside Edge Impulse Studio

Other tutorials

Generate image datasets using Dall·E

Local Software Requirements

Set up OpenAI API

Generate your first image

Generate some variations of this image

Generate a dataset:

Plot all the output images:

What next?

Generate keyword spotting datasets

Local Software Requirements

Set up Google TTS API

Generate the desired samples

What next?

Generate physics simulation datasets

Local Software Requirements

Create object to simulate

Visualizing the problem

Setting up the simulation environment

Drop simulation

What next?

Generate audio datasets using Eleven Labs

Example: Glass Breaking Sound

Getting Started