Transformation blocks
Last updated
Last updated
Transformation blocks are very flexible and can be used for most advanced use cases.
They can either take raw data from your organizational datasets and convert the data into files that can be loaded in an Edge Impulse project/another organizational dataset. But you can also use the transformation blocks as cloud jobs to perform specific actions using standalone mode.
Transformation blocks are available in your organization pipelines and in your project pipelines so you can automate your processes.
You can use transformation blocks to fetch external datasets, augment/create variants of your data samples, generate synthetic datasets, extract metadata from config files, create helper graphs, align and interpolate measurements across sensors, or remove duplicate entries. The possibilities are endless.
Transformation blocks can be written in any language, and run on Edge Impulse infrastructure.
Only available with Edge Impulse Enterprise Plan
Try our FREE Enterprise Trial today.
Transformation blocks can be complex to set up and are one of the most advanced features Edge Impulse provides. Feel free to ask your customer solution engineer for some help and some examples, we have been setting up complex pipelines for our customers and our engineers have acquired a lot of expertise with transformation blocks.
You can run your transformation blocks as transformation jobs. They can be triggered:
from your organization:
From this view, Custom blocks->Transformation
From the Data transformation
From the Data pipelines
from your projects:
From the Data sources (Standalone transformation blocks only)
From Synthetic data (Synthetic data blocks only)
By default, we provide several pre-built transformation blocks that you can use directly in your organization or your organization's projects.
We will add more over time when we see a recurring need or interest. The current ones are the following:
A transformation block consists of a Docker image that contains one or several scripts. The Docker image is encapsulated in the transformation block with additional parameters.
Here is a minimal configuration for the transformation blocks:
In this documentation page, we will explain how to setup a transformation block and will explain the different options.
You can directly create your transformation block within Edge Impulse Studio from a public Docker image or import existing transformation blocks:
Example repository
You can find several transformation block examples in this Github repository. These are a great way to get started, either by importing them directly in your organization or by using them as a getting-started template.
To run the data transformation jobs, see the Data transformation documentation page.
To setup your block, an easy method is to use the Edge Impulse CLI command, edge-impulse-blocks init
:
Tip: If you want to access your bucket, make sure to press <space>
to select the bucket attached to your organization.
The step above will create the following .parameters.json
file in your project directory:
To push your transformation block, simply run edge-impulse-blocks push
.
At Edge Impulse, we mostly use Python, Javascript/Typescript and Bash scripts, but you can write your transformation blocks in any language.
Dockerfile example to trigger a Bash script:
Dockerfile example to trigger a Python script and install the required dependencies:
The Dockerfile above describes a base image (Python 3.7.5), the Python dependencies (in requirements.txt
) and which script to run (transform.py
).
Note: Do not use a WORKDIR
under /home
! The /home
path will be mounted in by Edge Impulse, making your files inaccessible.
ENTRYPOINT vs RUN / CMD
If you create a custom Dockerfile, make sure to use ENTRYPOINT
to specify the application to execute, rather than RUN
or CMD
.
If you want to host your docker image on an external registry, you can use Docker Hub and use the username/image:tag
in the Docker container field.
We provide three modes to access your data:
In the Standalone mode, no data is passed to the container, but you can still access data by mounting your bucket onto the container.
At the Data item level, we pass the --in-directory
and --out-directory
arguments. The transformation jobs will run on each directory present in your selected path. These jobs can run in parallel.
At the file level, we pass the --in-file
and --out-directory
arguments. The transformation jobs will run on each file present in your selected path. These jobs can run in parallel.
Note that for the two last operation modes, you can use query filters to only include certain data items and certain files.
The stand-alone method is the most flexible option (it can work on both generic and clinical datasets). You can consider this transformation block as a cloud job that you can use for anything in your machine learning pipelines.
Please note that this mode does not support running jobs in parallel, as it is unknown in advance how many files or how many directories are present in your dataset.
To access your data, you must mount your bucket/upload portal into the container, you can do this both when setting up your transformation block using Edge Impulse CLI, or directly in the studio when creating/editing a transformation block.
You can use custom blocks parameters to retrieve the bucket name and the required directory to access your files programmatically.
Examples
--in-directory
)When selecting the Data item operation mode, two parameters will be passed to the container:
--in-directory
--out-directory
The transformation jobs will run on each "Data item" (directory) present in your selected path or dataset.
--in-file
)When selecting the File operation mode, two parameters will be passed to the container:
--in-file
--out-directory
The transformation jobs will run on each file present in selected path.
When editing your block on Edge Impulse Studio, you can set the number of desired CPUs and the memory needed for your container to run properly. Likely, you can set the limits of the same parameters.
You can update the metadata of blocks directly from a transformation block by creating a ei-metadata.json
file in the output directory. The metadata is then applied to the new data item automatically when the transform job finishes. The ei-metadata.json
file has the following structure:
Some notes:
If action
is set to add
the metadata keys are added to the data item. If action
is set to replace
all existing metadata keys are removed.
When using the CLI to setup your block, by default we mount your bucket with the following mounting point:
You can change this value if you want your transformation block to behave differently.
See adding parameters to custom blocks dedicated documentation page.
Transformation blocks get access to the following environmental variables, which let you authenticate with the Edge Impulse API. This way you don't have to inject these credentials into the block. The variables are:
EI_API_KEY
- an API key with 'member' privileges for the organization.
EI_ORGANIZATION_ID
- the organization ID that the block runs in.
EI_API_ENDPOINT
- the API endpoint (default: https://studio.edgeimpulse.com/v1).
Label image data using GPT-4o: Label image data using GPT-4o block
Text to speech transform block (Javascript): GitHub
Fetch a dataset hosted on Kaggle (Python): Github
Generate graph from sensor csv data (Python): Github
Hello Edge (Bash): Github
--in-file
)Mix background noise into audio files (Bash script): GitHub
Access your data - Helper transformation block (Python): Github
Resample CSV (Python): Github
--in-directory
)Access your data - Helper transformation block (Python): Github
Check file existence - Add ei_check metadata on file existence (Python): Github
Merge CSV files - Merge CSV files on a given key (Python): Github
Merge audio and CSV - Merge audio file and time-series CSV (Python): Github
Now that you have a better idea of what are transformation blocks, here is a graphical recap of how it works:
This is the specification for the parameters.json
file:
This is the specification for the parameters.json
file:
The job run indefinitely
If you notice that your jobs run indefinitely, it is probably because of an error or the script has not been properly terminated. Make sure to exit your script with code 0 (return 0
, exit(0)
or sys.exit(0)
) for success or with any other error code for failure.
Cannot access files in bucket
If you cannot access your files in your bucket, make sure that the mount point is properly configured.
When using the CLI, it is a common mistake to forget pressing <space>
key to select the bucket attached to your organization.
Job failed without logs (only Job failed)
It probably means that we had an issue when triggering the container. In many cases it is related with the issue above, the mount point not being properly configured.
I cannot access the logs
We are still investigating why all the logs are not displayed properly. If you are using Python, you can also flush stdout after you print it using something like print("hello", flush=True)
.
Can I host my Docker image on Docker Hub?
Yes, you can. You can test this Standalone transformation block if you'd like: luisomoreau/hello_edge:latest
Also, make sure to configure the additional block parameters with this config:
It will print "hello +name" on the transformation job logs.