LogoLogo
HomeDocsAPIProjectsForum
  • Getting Started
    • For beginners
    • For ML practitioners
    • For embedded engineers
  • Frequently asked questions
  • Tutorials
    • End-to-end tutorials
      • Continuous motion recognition
      • Responding to your voice
      • Recognize sounds from audio
      • Adding sight to your sensors
        • Collecting image data from the Studio
        • Collecting image data with your mobile phone
        • Collecting image data with the OpenMV Cam H7 Plus
      • Object detection
        • Detect objects using MobileNet SSD
        • Detect objects with FOMO
      • Sensor fusion
      • Sensor fusion using Embeddings
      • Processing PPG input with HR/HRV Features Block
      • Industrial Anomaly Detection on Arduino® Opta® PLC
    • Advanced inferencing
      • Continuous audio sampling
      • Multi-impulse
      • Count objects using FOMO
    • API examples
      • Running jobs using the API
      • Python API Bindings Example
      • Customize the EON Tuner
      • Ingest multi-labeled data using the API
      • Trigger connected board data sampling
    • ML & data engineering
      • EI Python SDK
        • Using the Edge Impulse Python SDK with TensorFlow and Keras
        • Using the Edge Impulse Python SDK to run EON Tuner
        • Using the Edge Impulse Python SDK with Hugging Face
        • Using the Edge Impulse Python SDK with Weights & Biases
        • Using the Edge Impulse Python SDK with SageMaker Studio
        • Using the Edge Impulse Python SDK to upload and download data
      • Label image data using GPT-4o
      • Label audio data using your existing models
      • Generate synthetic datasets
        • Generate image datasets using Dall·E
        • Generate keyword spotting datasets
        • Generate physics simulation datasets
        • Generate audio datasets using Eleven Labs
      • FOMO self-attention
    • Lifecycle Management
      • CI/CD with GitHub Actions
      • OTA Model Updates
        • with Nordic Thingy53 and the Edge Impulse APP
      • Data Aquisition from S3 Object Store - Golioth on AI
    • Expert network projects
  • Edge Impulse Studio
    • Organization hub
      • Users
      • Data campaigns
      • Data
      • Data transformation
      • Upload portals
      • Custom blocks
        • Transformation blocks
        • Deployment blocks
          • Deployment metadata spec
      • Health Reference Design
        • Synchronizing clinical data with a bucket
        • Validating clinical data
        • Querying clinical data
        • Transforming clinical data
        • Buildling data pipelines
    • Project dashboard
      • Select AI Hardware
    • Devices
    • Data acquisition
      • Uploader
      • Data explorer
      • Data sources
      • Synthetic data
      • Labeling queue
      • AI labeling
      • CSV Wizard (Time-series)
      • Multi-label (Time-series)
      • Tabular data (Pre-processed & Non-time-series)
      • Metadata
      • Auto-labeler [Deprecated]
    • Impulse design & Experiments
    • Bring your own model (BYOM)
    • Processing blocks
      • Raw data
      • Flatten
      • Image
      • Spectral features
      • Spectrogram
      • Audio MFE
      • Audio MFCC
      • Audio Syntiant
      • IMU Syntiant
      • HR/HRV features
      • Building custom processing blocks
        • Hosting custom DSP blocks
      • Feature explorer
    • Learning blocks
      • Classification (Keras)
      • Anomaly detection (K-means)
      • Anomaly detection (GMM)
      • Visual anomaly detection (FOMO-AD)
      • Regression (Keras)
      • Transfer learning (Images)
      • Transfer learning (Keyword Spotting)
      • Object detection (Images)
        • MobileNetV2 SSD FPN
        • FOMO: Object detection for constrained devices
      • NVIDIA TAO (Object detection & Images)
      • Classical ML
      • Community learn blocks
      • Expert Mode
      • Custom learning blocks
    • EON Tuner
      • Search space
    • Retrain model
    • Live classification
    • Model testing
    • Performance calibration
    • Deployment
      • EON Compiler
      • Custom deployment blocks
    • Versioning
  • Tools
    • API and SDK references
    • Edge Impulse CLI
      • Installation
      • Serial daemon
      • Uploader
      • Data forwarder
      • Impulse runner
      • Blocks
      • Himax flash tool
    • Edge Impulse for Linux
      • Linux Node.js SDK
      • Linux Go SDK
      • Linux C++ SDK
      • Linux Python SDK
      • Flex delegates
    • Edge Impulse Python SDK
  • Run inference
    • C++ library
      • As a generic C++ library
      • On your desktop computer
      • On your Zephyr-based Nordic Semiconductor development board
    • Linux EIM Executable
    • WebAssembly
      • Through WebAssembly (Node.js)
      • Through WebAssembly (browser)
    • Docker container
    • Edge Impulse firmwares
  • Edge AI Hardware
    • Overview
    • MCU
      • Nordic Semi nRF52840 DK
      • Nordic Semi nRF5340 DK
      • Nordic Semi nRF9160 DK
      • Nordic Semi nRF9161 DK
      • Nordic Semi nRF9151 DK
      • Nordic Semi nRF7002 DK
      • Nordic Semi Thingy:53
      • Nordic Semi Thingy:91
    • CPU
      • macOS
      • Linux x86_64
    • Mobile Phone
    • Porting Guide
  • Integrations
    • Arduino Machine Learning Tools
    • NVIDIA Omniverse
    • Embedded IDEs - Open-CMSIS
    • Scailable
    • Weights & Biases
  • Pre-built datasets
    • Continuous gestures
    • Running faucet
    • Keyword spotting
    • LiteRT (Tensorflow Lite) reference models
  • Tips & Tricks
    • Increasing model performance
    • Data augmentation
    • Inference performance metrics
    • Optimize compute time
    • Adding parameters to custom blocks
    • Combine Impulses
  • Concepts
    • Glossary
    • Data Engineering
      • Audio Feature Extraction
      • Motion Feature Extraction
    • ML Concepts
      • Neural Networks
        • Layers
        • Activation Functions
        • Loss Functions
        • Optimizers
          • Learned Optimizer (VeLO)
        • Epochs
      • Evaluation Metrics
    • Edge AI
      • Introduction to edge AI
      • What is edge computing?
      • What is machine learning (ML)?
      • What is edge AI?
      • How to choose an edge AI device
      • Edge AI lifecycle
      • What is edge MLOps?
      • What is Edge Impulse?
      • Case study: Izoelektro smart grid monitoring
      • Test and certification
    • What is embedded ML, anyway?
    • What is edge machine learning (edge ML)?
Powered by GitBook
On this page
  • Run transformation blocks
  • Public blocks
  • Understanding the transformation blocks
  • Import existing transformation blocks
  • Setting up transformation blocks
  • Dockerfile
  • Operation modes
  • Compute requests & limits
  • Metadata (Data item and file operation modes)
  • Mounting points
  • Custom parameters
  • Environmental variables
  • Examples & resources
  • Recap
  • Parameters.json format
  • Troubleshooting
  1. Edge Impulse Studio
  2. Organization hub
  3. Custom blocks

Transformation blocks

PreviousCustom blocksNextDeployment blocks

Last updated 6 months ago

Transformation blocks are very flexible and can be used for most advanced use cases.

They can either take raw data from your and convert the data into files that can be loaded in an Edge Impulse project/another organizational dataset. But you can also use the transformation blocks as cloud jobs to perform specific actions using standalone mode.

Transformation blocks are available in your and in your so you can automate your processes.

You can use transformation blocks to fetch external datasets, augment/create variants of your data samples, generate synthetic datasets, extract metadata from config files, create helper graphs, align and interpolate measurements across sensors, or remove duplicate entries. The possibilities are endless.

Transformation blocks can be written in any language, and run on Edge Impulse infrastructure.

Only available with Edge Impulse Enterprise Plan

Transformation blocks can be complex to set up and are one of the most advanced features Edge Impulse provides. Feel free to ask your customer solution engineer for some help and some examples, we have been setting up complex pipelines for our customers and our engineers have acquired a lot of expertise with transformation blocks.

Run transformation blocks

You can run your transformation blocks as transformation jobs. They can be triggered:

from your organization:

  • From this view, Custom blocks->Transformation

from your projects:

Public blocks

By default, we provide several pre-built transformation blocks that you can use directly in your organization or your organization's projects.

We will add more over time when we see a recurring need or interest. The current ones are the following:

Understanding the transformation blocks

A transformation block consists of a Docker image that contains one or several scripts. The Docker image is encapsulated in the transformation block with additional parameters.

Here is a minimal configuration for the transformation blocks:

In this documentation page, we will explain how to setup a transformation block and will explain the different options.

Import existing transformation blocks

You can directly create your transformation block within Edge Impulse Studio from a public Docker image or import existing transformation blocks:

Example repository

Setting up transformation blocks

$> edge-impulse-blocks init 

Edge Impulse Blocks v1.21.1
? In which organization do you want to create this block? 
❯ Developer Relations 
  Medical Laboratories inc.
  Demo Team 
Attaching block to organization 'Developer Relations'
? Choose a type of block
❯ Transformation block 
  Deployment block 
  DSP block 
  Machine learning block
? Choose an option: 
❯ Create a new block 
  Update an existing block 
? Enter the name of your block: Generate helper graphs from sensor CSV
? Enter the description of your block: Transformation block to help you visualize what how your sensor time series data look like by creating a graph from the CSV files
? What type of data does this block operate on? 
  File (--in-file passed into the block) 
  Data item (--in-directory passed into the block) 
❯ Standalone (runs the container, but no files / data items passed in)
? Which buckets do you want to mount into this block (will be mounted under /mnt/s3fs/BUCKET_NAME, you can change these mount points in the Studio)?
(Press <space> to select, <a> to toggle all, <i> to invert selection)
❯ ◉ edge-impulse-devrel-team
  ◯ ei-datasets
❯ yes 
  no 

Tip: If you want to access your bucket, make sure to press <space> to select the bucket attached to your organization.

The step above will create the following .parameters.json file in your project directory:

{
    "version": 1,
    "type": "transform",
    "info": {
        "name": "Generate helper graphs from sensor CSV",
        "description": "Transformation block to help you visualize what how your sensor time series data look like by creating a graph from the CSV files",
        "operatesOn": "standalone",
        "transformMountpoints": [
            {
                "bucketId": 3096,
                "mountPoint": "/mnt/s3fs/edge-impulse-devrel-team"
            }
        ]
    },
    "parameters": []
}

To push your transformation block, simply run edge-impulse-blocks push.

Dockerfile

At Edge Impulse, we mostly use Python, Javascript/Typescript and Bash scripts, but you can write your transformation blocks in any language.

FROM ubuntu:20.04

WORKDIR /app

# Copy the bash script into the container
COPY hello.sh /hello.sh

# Make the bash script executable
RUN chmod +x /hello.sh

# Set the entrypoint command to run the script with the provided --name argument
ENTRYPOINT ["/hello.sh"]

Dockerfile example to trigger a Python script and install the required dependencies:

FROM python:3.7.5-stretch

WORKDIR /app

# Python dependencies
COPY requirements.txt ./
RUN pip3 --no-cache-dir install -r requirements.txt

COPY . ./

ENTRYPOINT [ "python3",  "transform.py" ]

The Dockerfile above describes a base image (Python 3.7.5), the Python dependencies (in requirements.txt) and which script to run (transform.py).

Note: Do not use a WORKDIR under /home! The /home path will be mounted in by Edge Impulse, making your files inaccessible.

ENTRYPOINT vs RUN / CMD

If you create a custom Dockerfile, make sure to use ENTRYPOINT to specify the application to execute, rather than RUN or CMD.

Operation modes

We provide three modes to access your data:

  • In the Standalone mode, no data is passed to the container, but you can still access data by mounting your bucket onto the container.

  • At the Data item level, we pass the --in-directory and --out-directory arguments. The transformation jobs will run on each directory present in your selected path. These jobs can run in parallel.

  • At the file level, we pass the --in-file and --out-directory arguments. The transformation jobs will run on each file present in your selected path. These jobs can run in parallel.

Standalone

The stand-alone method is the most flexible option (it can work on both generic and clinical datasets). You can consider this transformation block as a cloud job that you can use for anything in your machine learning pipelines.

Please note that this mode does not support running jobs in parallel, as it is unknown in advance how many files or how many directories are present in your dataset.

To access your data, you must mount your bucket/upload portal into the container, you can do this both when setting up your transformation block using Edge Impulse CLI, or directly in the studio when creating/editing a transformation block.

Examples

Python script to create graphs from your CSV sensor Data
import os, sys, argparse
import pandas as pd
import matplotlib.pyplot as plt

# Set the arguments
parser = argparse.ArgumentParser(description='Organization transformation block')
parser.add_argument('--sensor_name', type=str, required=True, help="Sensor data to extract to create the graph")
parser.add_argument('--bucket_name', type=str, required=False, help="Bucket where your dataset is hosted")
parser.add_argument('--bucket_directory', type=str, required=False, help="Directory in your bucket where your dataset is hosted")

args, unknown = parser.parse_known_args()

sensor_name = args.sensor_name
sensor_file = sensor_name + '.csv'

bucket_name = args.bucket_name
bucket_prefix = args.bucket_directory
mount_prefix = os.getenv('MOUNT_PREFIX', '/mnt/s3fs/')

folder = os.path.join(mount_prefix, bucket_name, bucket_prefix) if bucket_prefix else os.path.join(mount_prefix, bucket_name)

# Check if folder exists
if os.path.exists(folder):
    print('path exist', folder)
    for dirpath, dirnames, filenames in os.walk(folder):
        print("dirpath:",dirpath)
        if os.path.exists(os.path.join(dirpath, sensor_file)):
            print("File exist: ", os.path.join(dirpath, sensor_file))
            
            df = pd.read_csv(os.path.join(dirpath, sensor_file))
            df.index = pd.to_datetime(df['time'], unit='ns')

            # Get a list of all columns except 'time' and 'seconds_elapsed'
            columns_to_plot = [col for col in df.columns if col not in ['time', 'seconds_elapsed']]

            # Create subplots for each selected column
            fig, axes = plt.subplots(nrows=len(columns_to_plot), ncols=1, figsize=(12, 6 * len(columns_to_plot)))

            for i, col in enumerate(columns_to_plot):
                axes[i].plot(df.index, df[col])
                axes[i].set_title(f'{col} over time')
                axes[i].set_xlabel('time')
                axes[i].set_ylabel(col)
                axes[i].grid(True)

            # Save the figure with all subplots in the same directory
            plt.tight_layout()
            print("Graph created")
            plt.savefig(os.path.join(dirpath, sensor_name))

            # Display the plots (optional)
            # plt.show()
        
        else:
            print("file is missing in directory ", dirpath)
     
else:
    print('Path does not exist')
    sys.exit(1)


print("Finished")
exit(0)
Bash script to print "Hello +name" in the log console
#!/bin/bash

while [[ $# -gt 0 ]]; do
  key="$1"

  case $key in
    --name)
      NAME="$2"
      shift # past argument
      shift # past value
      ;;
    *)
      # Unknown option
      echo "Unknown option: $1"
      exit 1
      ;;
  esac
done

echo "Hello $NAME"

Data item (--in-directory)

When selecting the Data item operation mode, two parameters will be passed to the container:

  • --in-directory

  • --out-directory

The transformation jobs will run on each "Data item" (directory) present in your selected path or dataset.

Example

For example, let's consider a clinical dataset like the following, each data item has several files:

Setup the transformation job in the Studio:

You will be able to see logs for each data items looking like the following:

--in-directory:  /data/edge-impulse-devrel-team/datasets/activity-detection/Cycling-2023-09-14_06-47-00/
--out-directory:  /home/transform/26773541
--in-file:  None
in-directory has ['Accelerometer.csv', 'Accelerometer.png', 'Annotation.csv', 'Gravity.csv', 'Gyroscope.csv', 'Location.csv', 'LocationGps.csv', 'LocationNetwork.csv', 'Magnetometer.csv', 'Metadata.csv', 'Orientation.csv', 'Pedometer.csv', 'TotalAcceleration.csv']
out-directory has []
Copying file from  /data/edge-impulse-devrel-team/datasets/activity-detection/Cycling-2023-09-14_06-47-00/ to  /home/transform/26773541
out-directory has now ['Accelerometer.csv']

The copied file will be placed in a temporary directory and then be copied to the desired output dataset respecting the folder structure.

File (--in-file)

When selecting the File operation mode, two parameters will be passed to the container:

  • --in-file

  • --out-directory

The transformation jobs will run on each file present in selected path.

Example

If we use the same transformation block as above. First, make sure to set the operation mode to File by editing your transformation block:

Then set the transformation job.

We have used the following filter: dataset = 'Activity Detection (Clinical view)' and file_name like '%Gyro%'

Run the jobs and the logs for one file should look like the following:

--in-directory:  None
--out-directory:  /home/transform/26808538
--in-file:  /data/edge-impulse-devrel-team/datasets/activity-detection/Sitting-2023-09-14_09-11-15/Gyroscope.csv
out-directory has []
--in-file path /data/edge-impulse-devrel-team/datasets/activity-detection/Sitting-2023-09-14_09-11-15/Gyroscope.csv exist
coping Gyroscope.csv to /home/transform/26773541
out-directory has now ['Gyroscope.csv']

The copied file will be placed in a temporary directory and then be copied to the desired output dataset respecting the folder structure (along with the file we copied with the previous step using the Data item mode).

Compute requests & limits

When editing your block on Edge Impulse Studio, you can set the number of desired CPUs and the memory needed for your container to run properly. Likely, you can set the limits of the same parameters.

Metadata (Data item and file operation modes)

You can update the metadata of blocks directly from a transformation block by creating a ei-metadata.json file in the output directory. The metadata is then applied to the new data item automatically when the transform job finishes. The ei-metadata.json file has the following structure:

{
    "version": 1,
    "action": "add",
    "metadata": {
        "some-key": "some-value"
    }
}

Some notes:

  • If action is set to add the metadata keys are added to the data item. If action is set to replace all existing metadata keys are removed.

Mounting points

When using the CLI to setup your block, by default we mount your bucket with the following mounting point:

/mnt/s3fs/your-bucket

You can change this value if you want your transformation block to behave differently.

Custom parameters

Environmental variables

Transformation blocks get access to the following environmental variables, which let you authenticate with the Edge Impulse API. This way you don't have to inject these credentials into the block. The variables are:

  • EI_API_KEY - an API key with 'member' privileges for the organization.

  • EI_ORGANIZATION_ID - the organization ID that the block runs in.

  • EI_API_ENDPOINT - the API endpoint (default: https://studio.edgeimpulse.com/v1).

Examples & resources

Standalone

File (--in-file)

Data Item (--in-directory)

Recap

Now that you have a better idea of what are transformation blocks, here is a graphical recap of how it works:

Parameters.json format

Transformation blocks

This is the specification for the parameters.json file:

type TransformBlockParametersJson = {
    version: 1,
    type: 'transform',
    info: {
        name: string,
        description: string,
        operatesOn: 'file' | 'directory' | 'standalone' | undefined;
        transformMountpoints: {
            bucketId: number;
            mountPoint: string;
        }[] | undefined;
        indMetadata: boolean | undefined;
        cliArguments: string | undefined;
        allowExtraCliArguments: boolean | undefined;
        showInDataSources: boolean | undefined;
        showInCreateTransformationJob: boolean | undefined;
    },
    // see spec in https://docs.edgeimpulse.com/docs/tips-and-tricks/adding-parameters-to-custom-blocks
    parameters: DSPParameterItem[];
};

Synthetic data blocks

This is the specification for the parameters.json file:

type SyntheticDataBlockParametersJson = {
    version: 1,
    type: 'synthetic-data',
    info: {
        name: string,
        description: string,
    },
    // see spec in https://docs.edgeimpulse.com/docs/tips-and-tricks/adding-parameters-to-custom-blocks
    parameters: DSPParameterItem[];
};

Troubleshooting

The job run indefinitely

If you notice that your jobs run indefinitely, it is probably because of an error or the script has not been properly terminated. Make sure to exit your script with code 0 (return 0, exit(0) or sys.exit(0)) for success or with any other error code for failure.

Cannot access files in bucket

If you cannot access your files in your bucket, make sure that the mount point is properly configured.

When using the CLI, it is a common mistake to forget pressing <space> key to select the bucket attached to your organization.

Job failed without logs (only Job failed)

It probably means that we had an issue when triggering the container. In many cases it is related with the issue above, the mount point not being properly configured.

I cannot access the logs

We are still investigating why all the logs are not displayed properly. If you are using Python, you can also flush stdout after you print it using something like print("hello", flush=True).

Can I host my Docker image on Docker Hub?

[
    {
        "name": "Name",
        "type": "string",
        "param": "name",
        "value": "",
        "help": "Person to greet"
    }
]

It will print "hello +name" on the transformation job logs.

Try our FREE today.

From the

From the

From the ( transformation blocks only)

From (Synthetic data blocks only)

You can find several transformation block examples in this repository. These are a great way to get started, either by importing them directly in your organization or by using them as a getting-started template.

To run the data transformation jobs, see the documentation page.

To setup your block, an easy method is to use the command, edge-impulse-blocks init:

to trigger a Bash script:

If you want to host your docker image on an external registry, you can use and use the username/image:tag in the Docker container field.

Note that for the two last operation modes, you can use to only include certain data items and certain files.

You can use to retrieve the bucket name and the required directory to access your files programmatically.

e.g. in Python, a script to :

e.g. a , the name being passed as an argument in the transformation block using the custom block parameters:

Now let's create a transformation block that simply output the arguments and copy the Accelerometer.csv file to the output dataset. This block is available in the

See dedicated documentation page.

Label image data using GPT-4o:

Dall-E Image Generation (Python): /

Text to speech transform block (Javascript):

Fetch a dataset hosted on Kaggle (Python):

Generate graph from sensor csv data (Python):

Hello Edge (Bash):

Mix background noise into audio files (Bash script):

Access your data - Helper transformation block (Python):

Resample CSV (Python):

Access your data - Helper transformation block (Python):

Check file existence - Add ei_check metadata on file existence (Python):

Merge CSV files - Merge CSV files on a given key (Python):

Merge audio and CSV - Merge audio file and time-series CSV (Python):

Yes, you can. You can test this Standalone transformation block if you'd like:

Also, make sure to configure the with this config:

Enterprise Trial
Data transformation
Data pipelines
Synthetic data
DALL-E 3 Image Generation Block
Find best Visual AD model
Whisper Voice Synthesis Block
Label data using GPT-4o
Github
Data transformation
Edge Impulse CLI
Dockerfile example
Docker Hub
query filters
custom blocks parameters
create graphs from CSV sensor data
bash script to print "Hello +name"
transformation blocks Github repository
adding parameters to custom blocks
Label image data using GPT-4o block
Tutorial
GitHub
GitHub
Github
Github
Github
GitHub
Github
Github
Github
Github
Github
Github
luisomoreau/hello_edge:latest
additional block parameters
Data sources
Standalone
organizational datasets
organization pipelines
project pipelines
Transformation blocks overview
Transformation block structure, minimal setup
Create new transformation block from the Studio
Adding mount points and parameters
Setting up mount point
Custom transformation block diagram
Setting up mount point