Project transformation block
Last updated
Was this helpful?
Last updated
Was this helpful?
Transformation blocks take raw data from your and convert the data into files that can be loaded in an Edge Impulse project. You can use transformation blocks to only include certain parts of individual data files, calculate long-running features like a running mean or derivatives, or efficiently generate features with different window lengths. Transformation blocks can be written in any language, and run on the Edge Impulse infrastructure.
In this tutorial we build a Python-based transformation block that loads Parquet files, splits the data into thirty second windows, and uploads the data to a new project.
Want more? We also have an end-to-end example transformation block that mixes noise into an audio dataset: .
You'll need:
The .
If you receive any warnings that's fine. Run edge-impulse-blocks
afterwards to verify that the CLI was installed correctly.
The file which you can use to test the transformation block. This contains some data from the dataset in Parquet format.
Transformation blocks use Docker containers, a virtualization technique which lets developers package up an application with all dependencies in a single package. If you want to test your blocks locally you'll also need (this is not a requirement):
installed on your machine.
This is the Parquet schema for the gestures.parquet
file which we'll want to transform into data for a project:
To build a transformation block open a command prompt or terminal window, create a new folder, and run:
This will prompt you to log in, and enter the details for your block. E.g.:
Then, create the following files in this directory:
Dockerfile
We're building a Python based transformation block. The Dockerfile describes our base image (Python 3.7.5), our dependencies (in requirements.txt
) and which script to run (transform.py
).
Note: Do not use a WORKDIR
under /home
! The /home
path will be mounted in by Edge Impulse, making your files inaccessible.
ENTRYPOINT vs RUN / CMD
If you use a different programming language, make sure to use ENTRYPOINT
to specify the application to execute, rather than RUN
or CMD
.
requirements.txt
This file describes the dependencies for the block. We'll be using pandas
and pyarrow
to parse the Parquet file, and numpy
to do some calculations.
transform.py
This file includes the actual application. Transformation blocks are invoked with three parameters (as command line arguments):
--in-file
- A file from the organizational dataset. In this case the gestures.parquet
file.
--out-directory
- Directory to write files to.
--hmac-key
- You can use this HMAC key to sign the output files.
Add the following content:
On your local machine
To test the transformation block locally, if you have Python and all dependencies installed, just run:
This generates a number of JSON files in the out/
directory. You can test the import into an Edge Impulse project via:
Docker
You can also build the container locally via Docker, and test the block. The added benefit is that you don't need any dependencies installed on your local computer, and can thus test that you've included everything that's needed for the block. This requires Docker desktop to be installed.
First, build the container:
Then, run the container (make sure gestures.parquet
is in the same directory):
This generates a number of JSON files in the out/
directory. You can test the import into an Edge Impulse project via:
With the block ready we can push it to your organization. Open a command prompt or terminal window, navigate to the folder you created earlier, and run:
This packages up your folder, sends it to Edge Impulse where it'll be built, and finally is added to your organization.
The transformation block is now available in Edge Impulse under Data transformation > Transformation blocks.
If you make any changes to the block, just re-run edge-impulse-blocks push
and the block will be updated.
Next, upload the gestures.parquet
file, by going to Data > Add data... > Add data item, setting name as 'Gestures', dataset to 'Transform tutorial', and selecting the Parquet file.
This makes the gestures.parquet
file available from the Data page.
With the Parquet file in Edge Impulse and the transformation block configured you can now create a new job. Go to Data, and select the Parquet file by setting the filter to dataset = 'Transform tutorial'
.
Click the checkbox next to the data item, and select Transform selected (1 file). On the 'Create transformation job' page, select 'Import data into Project'. Then under 'Project', select '+ Create new project' and enter a name. Under 'Transformation block' select the new transformation block.
Click Start transformation job to start the job. This pulls the data in, starts a transformation job and finally imports the data into the new project. If you have multiple files selected the transformations will also run in parallel.
Transformation blocks get access to the following environmental variables, which let you authenticate with the Edge Impulse API. This way you don't have to inject these credentials into the block. The variables are:
EI_API_KEY
- an API key with 'member' privileges for the organization.
EI_ORGANIZATION_ID
- the organization ID that the block runs in.
EI_API_ENDPOINT
- the API endpoint (default: https://studio.edgeimpulse.com/v1).
Transformation blocks are a powerful feature which let you set up a data pipeline to turn raw data into actionable machine learning features. It also gives you a reproducible way of transforming many files at once, and is programmable through the so you can automatically convert new incoming data. Want more? We also have an end-to-end example transformation block that mixes noise into an audio dataset: .
If you're interested in transformation blocks or any of the other enterprise features,