Transformation blocks
Transformation blocks are very flexible and can be used for most advanced use cases.
They can either take raw data from your organizational datasets and convert the data into files that can be loaded in an Edge Impulse project/another organizational dataset. But you can also use the transformation blocks as cloud jobs to perform specific actions using standalone mode.
Transformation blocks are available in your organization pipelines and in your project pipelines so you can automate your processes.
You can use transformation blocks to fetch external datasets, to augment/create variants of your data samples, to extract metadata from config files, to create helper graphs, to align and interpolate measurements across sensors, or to remove duplicate entries. The possibilities are endless.
Transformation blocks can be written in any language, and run on Edge Impulse infrastructure.

Transformation blocks overview
Only available for enterprise customers
Organizational features are only available for enterprise customers. View our pricing for more information.
Transformation blocks can be complex to setup and are one of the most advanced features Edge Impulse provides. Feel free to ask to your customer solution engineer for some help and some examples, we have been setting up complex pipelines for our customers and our engineers have acquired a lot of expertise with transformation blocks.
A transformation block consist of a Docker image that contains one or several scripts. The Docker image is encapsulated in the transformation block with additional parameters.
Here is a minimal configuration for the transformation blocks:

Transformation block structure, minimal setup
In this documentation page, we will explain how to setup a transformation block and will explain the different options.
To setup your block, an easy method is to use the Edge Impulse CLI command,
edge-impulse-blocks init
:$> edge-impulse-blocks init
Edge Impulse Blocks v1.21.1
? In which organization do you want to create this block?
❯ Developer Relations
Medical Laboratories inc.
Demo Team
Attaching block to organization 'Developer Relations'
? Choose a type of block
❯ Transformation block
Deployment block
DSP block
Machine learning block
? Choose an option:
❯ Create a new block
Update an existing block
? Enter the name of your block: Generate helper graphs from sensor CSV
? Enter the description of your block: Transformation block to help you visualize what how your sensor time series data look like by creating a graph from the CSV files
? What type of data does this block operate on?
File (--in-file passed into the block)
Data item (--in-directory passed into the block)
❯ Standalone (runs the container, but no files / data items passed in)
? Which buckets do you want to mount into this block (will be mounted under /mnt/s3fs/BUCKET_NAME, you can change these mount points in the Studio)?
(Press <space> to select, <a> to toggle all, <i> to invert selection)
❯ ◉ edge-impulse-devrel-team
◯ ei-datasets
❯ yes
no
Tip: If you want to access your bucket, make sure to press
<space>
to select the bucket attached to your organization.The step above will create the following
.ei-block-config
in your project directory:{
"version": 1,
"config": {
"edgeimpulse.com": {
"name": "Generate graphs from sensor CSV - Standalone",
"type": "transform",
"description": "Generate graphs from sensor CSV - Standalone",
"organizationId": XXXX,
"operatesOn": "standalone",
"tlIndRequiresGpu": false,
"transformMountpoints": [
{
"bucketId": 3096,
"mountPoint": "/mnt/s3fs/edge-impulse-devrel-team"
}
],
"id": 5086
}
}
}
To push your transformation block, simply run
edge-impulse-block push
.Alternatively, you can directly create your transformation block within Edge Impulse Studio:

Create new transformation block from the Studio
At Edge Impulse, we mostly use Python, Javascript/Typescript and Bash scripts, but you can write your transformation blocks in any language.
FROM ubuntu:latest
WORKDIR /app
# Copy the bash script into the container
COPY hello.sh /hello.sh
# Make the bash script executable
RUN chmod +x /hello.sh
# Set the entrypoint command to run the script with the provided --name argument
ENTRYPOINT ["/hello.sh"]
Dockerfile example to trigger a Python script and install the required dependencies:
FROM python:3.7.5-stretch
WORKDIR /app
# Python dependencies
COPY requirements.txt ./
RUN pip3 --no-cache-dir install -r requirements.txt
COPY . ./
ENTRYPOINT [ "python3", "transform.py" ]
The Dockerfile above describes a base image (Python 3.7.5), the Python dependencies (in
requirements.txt
) and which script to run (transform.py
).Note: Do not use a
WORKDIR
under /home
! The /home
path will be mounted in by Edge Impulse, making your files inaccessible.ENTRYPOINT vs RUN / CMD
If you create a custom Dockerfile, make sure to use
ENTRYPOINT
to specify the application to execute, rather than RUN
or CMD
.If you want to host your docker image on an external registry, you can use Docker Hub and use the
username/image:tag
in the Docker container field.We provide three modes to access your data:
- In the Standalone mode, no data is passed to the container, but you can still access data by mounting your bucket onto the container.
- At the Data item level (only available for clinical datasets), we pass the
--in-directory
and--out-directory
arguments. Your transformation block should process the entire directory. - At the file level (only available for clinical datasets), we pass the
--in-file
and--out-directory
arguments. The transformation job will run on each file present in your dataset. These jobs can run in parallel.
Note that for the two last operation modes, you can use query filters to only include certain data items and certain files.
The stand-alone method is the most flexible option (it can work on both generic and clinical datasets). You can consider this transformation block as a cloud job that you can use for anything in your machine learning pipelines.
Please note that this mode does not support running jobs in parallel, as it is unknown in advance how many files or how many directories are present in your dataset.
To access your data, you must mount your bucket/upload portal into the container, you can do this both when setting up your transformation block using Edge Impulse CLI, or directly in the studio when creating/editing a transformation block.
Additionally, you can use custom blocks parameters to retrieve the bucket name and the required directory to access your files programmatically:

Adding mount points and parameters
import os, sys, argparse
import pandas as pd
import matplotlib.pyplot as plt
# Set the arguments
parser = argparse.ArgumentParser(description='Organization transformation block')
parser.add_argument('--sensor_name', type=str, required=True, help="Sensor data to extract to create the graph")
parser.add_argument('--bucket_name', type=str, required=False, help="Bucket where your dataset is hosted")
parser.add_argument('--bucket_directory', type=str, required=False, help="Directory in your bucket where your dataset is hosted")
args, unknown = parser.parse_known_args()
sensor_name = args.sensor_name
sensor_file = sensor_name + '.csv'
bucket_name = args.bucket_name
bucket_prefix = args.bucket_directory
mount_prefix = os.getenv('MOUNT_PREFIX', '/mnt/s3fs/')
folder = os.path.join(mount_prefix, bucket_name, bucket_prefix) if bucket_prefix else os.path.join(mount_prefix, bucket_name)
# Check if folder exists
if os.path.exists(folder):
print('path exist', folder)
for dirpath, dirnames, filenames in os.walk(folder):
print("dirpath:",dirpath)
if os.path.exists(os.path.join(dirpath, sensor_file)):
print("File exist: ", os.path.join(dirpath, sensor_file))
df = pd.read_csv(os.path.join(dirpath, sensor_file))
df.index = pd.to_datetime(df['time'], unit='ns')
# Get a list of all columns except 'time' and 'seconds_elapsed'
columns_to_plot = [col for col in df.columns if col not in ['time', 'seconds_elapsed']]
# Create subplots for each selected column
fig, axes = plt.subplots(nrows=len(columns_to_plot), ncols=1, figsize=(12, 6 * len(columns_to_plot)))
for i, col in enumerate(columns_to_plot):
axes[i].plot(df.index, df[col])
axes[i].set_title(f'{col} over time')
axes[i].set_xlabel('time')
axes[i].set_ylabel(col)
axes[i].grid(True)
# Save the figure with all subplots in the same directory
plt.tight_layout()
print("Graph created")
plt.savefig(os.path.join(dirpath, sensor_name))
# Display the plots (optional)
# plt.show()
else:
print("file is missing in directory ", dirpath)
else:
print('Path does not exist')
sys.exit(1)
print("Finished")
exit(0)
e.g. a bash script to print "Hello +name", the
name
being passed as an argument in the transformation block using the custom block parameters:#!/bin/bash
while [[ $# -gt 0 ]]; do
key="$1"
case $key in
--name)
NAME="$2"
shift # past argument
shift # past value
;;
*)
# Unknown option
echo "Unknown option: $1"
exit 1
;;
esac
done
echo "Hello $NAME"
When selecting the Data item operation mode, two parameters will be passed to the container:
--in-directory
--out-directory
The transformation jobs will run on each "Data item" present in your Clinical dataset.
For example, let's consider a clinical dataset like the following, each data item has several files:

Activity Detection data items
Now let's create a transformation block that simply output the arguments and copy the Accelerometer.csv file to the output dataset. This block is available in the transformation blocks Github repository
Setup the transformation job in the Studio:

Transformation job configuration
You will be able to see logs for each data items looking like the following:
--in-directory: /data/edge-impulse-devrel-team/datasets/activity-detection/Cycling-2023-09-14_06-47-00/
--out-directory: /home/transform/26773541
--in-file: None
in-directory has ['Accelerometer.csv', 'Accelerometer.png', 'Annotation.csv', 'Gravity.csv', 'Gyroscope.csv', 'Location.csv', 'LocationGps.csv', 'LocationNetwork.csv', 'Magnetometer.csv', 'Metadata.csv', 'Orientation.csv', 'Pedometer.csv', 'TotalAcceleration.csv']
out-directory has []
Copying file from /data/edge-impulse-devrel-team/datasets/activity-detection/Cycling-2023-09-14_06-47-00/ to /home/transform/26773541
out-directory has now ['Accelerometer.csv']

Transformation job overview
The copied file will be placed in a temporary directory and then be copied to the desired output dataset respecting the folder structure.

Activity Detection data items
When selecting the File operation mode, two parameters will be passed to the container:
--in-file
--out-directory
The transformation jobs will run on each file present in your clinical dataset.
If we use the same transformation block as above. First, make sure to set the operation mode to File by editing your transformation block:

Edit transformation block
Then set the transformation job.
We have used the following filter:
dataset = 'Activity Detection (Clinical view)' and file_name like '%Gyro%'

Transformation job configuration
Run the jobs and the logs for one file should look like the following:
--in-directory: None
--out-directory: /home/transform/26808538
--in-file: /data/edge-impulse-devrel-team/datasets/activity-detection/Sitting-2023-09-14_09-11-15/Gyroscope.csv
out-directory has []
--in-file path /data/edge-impulse-devrel-team/datasets/activity-detection/Sitting-2023-09-14_09-11-15/Gyroscope.csv exist
coping Gyroscope.csv to /home/transform/26773541
out-directory has now ['Gyroscope.csv']

Transformation job overview
The copied file will be placed in a temporary directory and then be copied to the desired output dataset respecting the folder structure (along with the file we copied with the previous step using the Data item mode).

Activity Detection data items
We provide two built-in transformations blocks that let you import data into a project or into a dataset. To create these jobs, go to Data transformation -> Create job. Make sure to select None in the Transformation block option.
Tip: You leverage the filters to query which files you want to import. See more on how to query your data. e.g:
dataset = 'Activity Detection (Clinical view)' AND file_name like 'Accelero%'
dataset = 'Activity Detection (Clinical view)' AND metadata->ei_check = 1
Import into project

Transformation job to import data into a project
Import into dataset

Transformation job to import data into a new dataset
When editing your block on Edge Impulse Studio, you can set the number of desired CPUs and the memory needed for your container to run properly. Likely, you can set the limits of the same parameters.
You can update the metadata of blocks directly from a transformation block by creating a
ei-metadata.json
file in the output directory. The metadata is then applied to the new data item automatically when the transform job finishes. The ei-metadata.json
file has the following structure:{
"version": 1,
"action": "add",
"metadata": {
"some-key": "some-value"
}
}
Some notes:
- If
action
is set toadd
the metadata keys are added to the data item. Ifaction
is set toreplace
all existing metadata keys are removed.
When using the CLI to setup your block, by default we mount your bucket with the following mounting point:
/mnt/s3fs/your-bucket

Setting up mount point
You can change this value if you want your transformation block to behave differently.
Transformation blocks get access to the following environmental variables, which let you authenticate with the Edge Impulse API. This way you don't have to inject these credentials into the block. The variables are:
EI_API_KEY
- an API key with 'member' privileges for the organization.EI_ORGANIZATION_ID
- the organization ID that the block runs in.EI_API_ENDPOINT
- the API endpoint (default: https://studio.edgeimpulse.com/v1).
Now that you have a better idea of what are transformation blocks, here is a graphical recap of how it works:

The job run indefinitely
If you notice that your jobs run indefinitely, it is probably because of an error or the script has not been properly terminated. Make sure to exit your script with code 0 (
return 0
, exit(0)
or sys.exit(0)
) for success or with any other error code for failure.Cannot access files in bucket
If you cannot access your files in your bucket, make sure that the mount point is properly configured.

Setting up mount point
When using the CLI, it is a common mistake to forget pressing
<space>
key to select the bucket attached to your organization.Job failed without logs (only Job failed)
It probably means that we had an issue when triggering the container. In many cases it is related with the issue above, the mount point not being properly configured.
I cannot access the logs
We are still investigating why all the logs are not displayed properly. If you are using Python, you can also flush stdout after you print it using something like
print("hello", flush=True)
.Can I host my Docker image on Docker Hub?
Yes, you can. You can test this Standalone transformation block if you'd like: luisomoreau/hello_edge:latest
[
{
"name": "Name",
"type": "string",
"param": "name",
"value": "",
"help": "Person to greet"
}
]
It will print "hello +name" on the transformation job logs.
Last modified 4d ago