Transforming clinical data
Transformation blocks take raw data from your organizational datasets and convert the data into a different dataset or files that can be loaded in an Edge Impulse project. You can use transformation blocks to only include certain parts of individual data files, calculate long-running features like a running mean or derivatives, or efficiently generate features with different window lengths. Transformation blocks can be written in any language, and run on the Edge Impulse infrastructure.
No required format for data files
There is no required format for data files. You can upload data in any format, whether it's CSV, Parquet, or a proprietary data format.
Parquet is a columnar storage format that is optimized for reading and writing large datasets. It is particularly useful for data that is stored in S3 buckets, as it can be read in parallel and is highly compressed.
The PPG-DaLiA dataset is a multimodal collection featuring physiological and motion data recorded from 15 subjects.
In this tutorial we build a Python-based transformation block that loads Parquet files, we process the dataset by calculating features and transforming it into a unified schema suitable for machine learning. If you haven't done so, go through synchronizing clinical data with a bucket first.
Only available with Edge Impulse Enterprise Plan
Try our FREE Enterprise Trial today.
1. Prerequisites
You'll need:
The Edge Impulse CLI.
If you receive any warnings that's fine. Run
edge-impulse-blocks
afterwards to verify that the CLI was installed correctly.
PPG-DaLiA CSV files: Download files like
ACC.csv
,HR.csv
,EDA.csv
, etc., which contain sensor data.
Transformation blocks use Docker containers, a virtualization technique that lets developers package up an application with all dependencies in a single package. If you want to test your blocks locally you'll also need (this is not a requirement):
Docker desktop installed on your machine.
2. Building your first transformation block
To build a transformation block open a command prompt or terminal window, create a new folder, and run:
This will prompt you to log in, and enter the details for your block. E.g.:
Then, create the following files in this directory:
2.1 - Dockerfile
We're building a Python based transformation block. The Dockerfile describes our base image (Python 3.7.5), our dependencies (in requirements.txt
) and which script to run (transform.py
).
Note: Do not use a WORKDIR
under /home
! The /home
path will be mounted in by Edge Impulse, making your files inaccessible.
ENTRYPOINT vs RUN / CMD
If you use a different programming language, make sure to use ENTRYPOINT
to specify the application to execute, rather than RUN
or CMD
.
2.2 - requirements.txt
This file describes the dependencies for the block. We'll be using pandas
and pyarrow
to parse the Parquet file, and numpy
to do some calculations.
2.3 - transform.py
This file includes the actual application. Transformation blocks are invoked with three parameters (as command line arguments):
--in-file
or--in-directory
- A file (if the block operates on a file), or a directory (if the block operates on a data item) from the organizational dataset. In this case theunified_data.parquet
file.--out-directory
- Directory to write files to.--hmac-key
- You can use this HMAC key to sign the output files. This is not used in this tutorial.--metadata
- Key/value pairs containing the metadata for the data item, plus additional metadata about the data item in thedataItemInfo
key. E.g.:{ "subject": "AAA001", "ei_check": "1", "dataItemInfo": { "id": 101, "dataset": "Human Activity 2022", "bucketName": "edge-impulse-tutorial", "bucketPath": "janjongboom/human_activity/AAA001/", "created": "2022-03-07T09:20:59.772Z", "totalFileCount": 14, "totalFileSize": 6347421 } }
Add the following content. This takes in the Parquet file, groups data by their label, and then calculates the RMS over the X, Y and Z axes of the accelerometer.
toggle to expand code
Docker
You can also build the container locally via Docker, and test the block. The added benefit is that you don't need any dependencies installed on your local computer, and can thus test that you've included everything that's needed for the block. This requires Docker desktop to be installed.
To build the container and test the block, open a command prompt or terminal window and navigate to the source directory. First, build the container:
Then, run the container (make sure unified_data.parquet
is in the same directory):
Seeing the output
This process has generated a new Parquet file in the out/
directory containing the RMS of the X, Y and Z axes. If you inspect the content of the file (e.g. using parquet-tools) you'll see the output:
If you don't have parquet-tools
installed, you can install it via:
Then, run:
This will show you the metadata and the columns in the file:
code output block:
3. Pushing the transformation block to Edge Impulse
With the block ready we can push it to your organization. Open a command prompt or terminal window, navigate to the folder you created earlier, and run:
This packages up your folder, sends it to Edge Impulse where it'll be built, and finally is added to your organization.
The transformation block is now available in Edge Impulse under Data transformation > Transformation blocks.
If you make any changes to the block, just re-run edge-impulse-blocks push
and the block will be updated.
4. Uploading unified_data.parquet to Edge Impulse
Next, upload the unified_data.parquet
file, by going to Data > Add data... > Add data item, setting name as 'Gestures', dataset to 'Transform tutorial', and selecting the Parquet file.
This makes the unified_data.parquet
file available from the Data page.
5. Starting the transformation
With the Parquet file in Edge Impulse and the transformation block configured you can now create a new job. Go to Data, and select the Parquet file by setting the filter to dataset = 'Transform tutorial'
.
Click the checkbox next to the data item, and select Transform selected (1 file). On the 'Create transformation job' page select 'Import data into Dataset'. Under 'output dataset', select 'Same dataset as source', and under 'Transformation block' select the new transformation block.
Click Start transformation job to start the job. This pulls the data in, starts a transformation job and finally uploads the data back to your dataset. If you have multiple files selected the transformations will also run in parallel.
You can now find the transformed file back in your dataset:
6. Next steps
Transformation blocks are a powerful feature which let you set up a data pipeline to turn raw data into actionable machine learning features. It also gives you a reproducible way of transforming many files at once, and is programmable through the Edge Impulse API so you can automatically convert new incoming data. If you're interested in transformation blocks or any of the other enterprise features, let us know!
🚀
Appendix: Advanced features
Updating metadata from a transformation block
You can update the metadata of blocks directly from a transformation block by creating a ei-metadata.json
file in the output directory. The metadata is then applied to the new data item automatically when the transform job finishes. The ei-metadata.json
file has the following structure:
Some notes:
If
action
is set toadd
the metadata keys are added to the data item. Ifaction
is set toreplace
all existing metadata keys are removed.
Environmental variables
Transformation blocks get access to the following environmental variables, which let you authenticate with the Edge Impulse API. This way you don't have to inject these credentials into the block. The variables are:
EI_API_KEY
- an API key with 'member' privileges for the organization.EI_ORGANIZATION_ID
- the organization ID that the block runs in.EI_API_ENDPOINT
- the API endpoint (default: https://studio.edgeimpulse.com/v1).
Custom parameters
You can specify custom arguments or parameters to your block by adding a parameters.json file in the root of your block directory. This file describes all arguments for your training pipeline, and is used to render custom UI elements for each parameter. For example, this parameters file:
Renders the following UI when you run the transformation block:
And the options are passed in as command line arguments to your block:
For more information, and all options see parameters.json.
Last updated