LogoLogo
HomeDocsAPIProjectsForum
  • Getting Started
    • For beginners
    • For ML practitioners
    • For embedded engineers
  • Frequently asked questions
  • Tutorials
    • End-to-end tutorials
      • Continuous motion recognition
      • Responding to your voice
      • Recognize sounds from audio
      • Adding sight to your sensors
        • Collecting image data from the Studio
        • Collecting image data with your mobile phone
        • Collecting image data with the OpenMV Cam H7 Plus
      • Object detection
        • Detect objects using MobileNet SSD
        • Detect objects with FOMO
      • Sensor fusion
      • Sensor fusion using Embeddings
      • Processing PPG input with HR/HRV Features Block
      • Industrial Anomaly Detection on Arduino® Opta® PLC
    • Advanced inferencing
      • Continuous audio sampling
      • Multi-impulse
      • Count objects using FOMO
    • API examples
      • Running jobs using the API
      • Python API Bindings Example
      • Customize the EON Tuner
      • Ingest multi-labeled data using the API
      • Trigger connected board data sampling
    • ML & data engineering
      • EI Python SDK
        • Using the Edge Impulse Python SDK with TensorFlow and Keras
        • Using the Edge Impulse Python SDK to run EON Tuner
        • Using the Edge Impulse Python SDK with Hugging Face
        • Using the Edge Impulse Python SDK with Weights & Biases
        • Using the Edge Impulse Python SDK with SageMaker Studio
        • Using the Edge Impulse Python SDK to upload and download data
      • Label image data using GPT-4o
      • Label audio data using your existing models
      • Generate synthetic datasets
        • Generate image datasets using Dall·E
        • Generate keyword spotting datasets
        • Generate physics simulation datasets
        • Generate audio datasets using Eleven Labs
      • FOMO self-attention
    • Lifecycle Management
      • CI/CD with GitHub Actions
      • OTA Model Updates
        • with Nordic Thingy53 and the Edge Impulse APP
      • Data Aquisition from S3 Object Store - Golioth on AI
    • Expert network projects
  • Edge Impulse Studio
    • Organization hub
      • Users
      • Data campaigns
      • Data
      • Data transformation
      • Upload portals
      • Custom blocks
        • Transformation blocks
        • Deployment blocks
          • Deployment metadata spec
      • Health Reference Design
        • Synchronizing clinical data with a bucket
        • Validating clinical data
        • Querying clinical data
        • Transforming clinical data
        • Buildling data pipelines
    • Project dashboard
      • Select AI Hardware
    • Devices
    • Data acquisition
      • Uploader
      • Data explorer
      • Data sources
      • Synthetic data
      • Labeling queue
      • AI labeling
      • CSV Wizard (Time-series)
      • Multi-label (Time-series)
      • Tabular data (Pre-processed & Non-time-series)
      • Metadata
      • Auto-labeler [Deprecated]
    • Impulse design & Experiments
    • Bring your own model (BYOM)
    • Processing blocks
      • Raw data
      • Flatten
      • Image
      • Spectral features
      • Spectrogram
      • Audio MFE
      • Audio MFCC
      • Audio Syntiant
      • IMU Syntiant
      • HR/HRV features
      • Building custom processing blocks
        • Hosting custom DSP blocks
      • Feature explorer
    • Learning blocks
      • Classification (Keras)
      • Anomaly detection (K-means)
      • Anomaly detection (GMM)
      • Visual anomaly detection (FOMO-AD)
      • Regression (Keras)
      • Transfer learning (Images)
      • Transfer learning (Keyword Spotting)
      • Object detection (Images)
        • MobileNetV2 SSD FPN
        • FOMO: Object detection for constrained devices
      • NVIDIA TAO (Object detection & Images)
      • Classical ML
      • Community learn blocks
      • Expert Mode
      • Custom learning blocks
    • EON Tuner
      • Search space
    • Retrain model
    • Live classification
    • Model testing
    • Performance calibration
    • Deployment
      • EON Compiler
      • Custom deployment blocks
    • Versioning
  • Tools
    • API and SDK references
    • Edge Impulse CLI
      • Installation
      • Serial daemon
      • Uploader
      • Data forwarder
      • Impulse runner
      • Blocks
      • Himax flash tool
    • Edge Impulse for Linux
      • Linux Node.js SDK
      • Linux Go SDK
      • Linux C++ SDK
      • Linux Python SDK
      • Flex delegates
    • Edge Impulse Python SDK
  • Run inference
    • C++ library
      • As a generic C++ library
      • On your desktop computer
      • On your Zephyr-based Nordic Semiconductor development board
    • Linux EIM Executable
    • WebAssembly
      • Through WebAssembly (Node.js)
      • Through WebAssembly (browser)
    • Docker container
    • Edge Impulse firmwares
  • Edge AI Hardware
    • Overview
    • MCU
      • Nordic Semi nRF52840 DK
      • Nordic Semi nRF5340 DK
      • Nordic Semi nRF9160 DK
      • Nordic Semi nRF9161 DK
      • Nordic Semi nRF9151 DK
      • Nordic Semi nRF7002 DK
      • Nordic Semi Thingy:53
      • Nordic Semi Thingy:91
    • CPU
      • macOS
      • Linux x86_64
    • Mobile Phone
    • Porting Guide
  • Integrations
    • Arduino Machine Learning Tools
    • NVIDIA Omniverse
    • Embedded IDEs - Open-CMSIS
    • Scailable
    • Weights & Biases
  • Pre-built datasets
    • Continuous gestures
    • Running faucet
    • Keyword spotting
    • LiteRT (Tensorflow Lite) reference models
  • Tips & Tricks
    • Increasing model performance
    • Data augmentation
    • Inference performance metrics
    • Optimize compute time
    • Adding parameters to custom blocks
    • Combine Impulses
  • Concepts
    • Glossary
    • Data Engineering
      • Audio Feature Extraction
      • Motion Feature Extraction
    • ML Concepts
      • Neural Networks
        • Layers
        • Activation Functions
        • Loss Functions
        • Optimizers
          • Learned Optimizer (VeLO)
        • Epochs
      • Evaluation Metrics
    • Edge AI
      • Introduction to edge AI
      • What is edge computing?
      • What is machine learning (ML)?
      • What is edge AI?
      • How to choose an edge AI device
      • Edge AI lifecycle
      • What is edge MLOps?
      • What is Edge Impulse?
      • Case study: Izoelektro smart grid monitoring
      • Test and certification
    • What is embedded ML, anyway?
    • What is edge machine learning (edge ML)?
Powered by GitBook
On this page
  1. Edge Impulse Studio
  2. Organization hub
  3. Health Reference Design

Transforming clinical data

PreviousQuerying clinical dataNextBuildling data pipelines

Last updated 6 months ago

Transformation blocks take raw data from your and convert the data into a different dataset or files that can be loaded in an Edge Impulse project. You can use transformation blocks to only include certain parts of individual data files, calculate long-running features like a running mean or derivatives, or efficiently generate features with different window lengths. Transformation blocks can be written in any language, and run on the Edge Impulse infrastructure.

In this tutorial we build a Python-based transformation block that loads Parquet files, calculates features from the Parquet file, and then writes a new file back to your dataset. If you haven't done so, go through first.

Only available with Edge Impulse Enterprise Plan

Try our FREE today.

1. Prerequisites

You'll need:

  • The .

    • If you receive any warnings that's fine. Run edge-impulse-blocks afterwards to verify that the CLI was installed correctly.

  • The file which you can use to test the transformation block. This contains some data from the dataset in Parquet format.

Transformation blocks use Docker containers, a virtualization technique that lets developers package up an application with all dependencies in a single package. If you want to test your blocks locally you'll also need (this is not a requirement):

  • installed on your machine.

1.1 - Parquet schema

This is the Parquet schema for the gestures.parquet file which we'll transform:

message root {
  required binary sampleName (UTF8);
  required int64 timestamp (TIMESTAMP_MILLIS);
  required int64 added (TIMESTAMP_MILLIS);
  required boolean signatureValid;
  required binary device (UTF8);
  required binary label (UTF8);
  required float accX;
  required float accY;
  required float accZ;
}

2. Building your first transformation block

To build a transformation block open a command prompt or terminal window, create a new folder, and run:

$ edge-impulse-blocks init

This will prompt you to log in, and enter the details for your block. E.g.:

Edge Impulse Blocks v1.9.0
? What is your user name or e-mail address (edgeimpulse.com)? jan+demo@edgeimpulse.com
? What is your password? [hidden]
Attaching block to organization 'Demo org Inc.'
? Choose a type of block Transformation block
? Choose an option Create a new block
? Enter the name of your block Demo dataset transformation
? Enter the description of your block Reads a Parquet file, extracts features, and writes the block back to the dataset
Creating block with config: {
  name: 'Demo dataset transformation',
  type: 'transform',
  description: 'Reads a Parquet file and splits it up in labeled data',
  organizationId: 34
}
Your new block 'Demo dataset transformation' has been created in '~/repos/tutorial-processing-block'.
When you have finished building your transformation block, run "edge-impulse-blocks push" to update the block in Edge Impulse.

Then, create the following files in this directory:

2.1 - Dockerfile

We're building a Python based transformation block. The Dockerfile describes our base image (Python 3.7.5), our dependencies (in requirements.txt) and which script to run (transform.py).

FROM python:3.7.5-stretch

WORKDIR /app

# Python dependencies
COPY requirements.txt ./
RUN pip3 --no-cache-dir install -r requirements.txt

COPY . ./

ENTRYPOINT [ "python3",  "transform.py" ]

Note: Do not use a WORKDIR under /home! The /home path will be mounted in by Edge Impulse, making your files inaccessible.

ENTRYPOINT vs RUN / CMD

If you use a different programming language, make sure to use ENTRYPOINT to specify the application to execute, rather than RUN or CMD.

2.2 - requirements.txt

This file describes the dependencies for the block. We'll be using pandas and pyarrow to parse the Parquet file, and numpy to do some calculations.

numpy==1.16.4
pandas==0.23.4
pyarrow==0.16.0

2.3 - transform.py

This file includes the actual application. Transformation blocks are invoked with three parameters (as command line arguments):

  • --in-file or --in-directory - A file (if the block operates on a file), or a directory (if the block operates on a data item) from the organizational dataset. In this case the gestures.parquet file.

  • --out-directory - Directory to write files to.

  • --hmac-key - You can use this HMAC key to sign the output files. This is not used in this tutorial.

  • --metadata - Key/value pairs containing the metadata for the data item, plus additional metadata about the data item in the dataItemInfo key. E.g.: { "subject": "AAA001", "ei_check": "1", "dataItemInfo": { "id": 101, "dataset": "Human Activity 2022", "bucketName": "edge-impulse-tutorial", "bucketPath": "janjongboom/human_activity/AAA001/", "created": "2022-03-07T09:20:59.772Z", "totalFileCount": 14, "totalFileSize": 6347421 } }

Add the following content. This takes in the Parquet file, groups data by their label, and then calculates the RMS over the X, Y and Z axes of the accelerometer.

import pyarrow.parquet as pq
import numpy as np
import math, os, sys, argparse, json, hmac, hashlib, time
import pandas as pd

# these are the three arguments that we get in
parser = argparse.ArgumentParser(description='Organization transformation block')
parser.add_argument('--in-file', type=str, required=True)
parser.add_argument('--out-directory', type=str, required=True)

args, unknown = parser.parse_known_args()

# verify that the input file exists and create the output directory if needed
if not os.path.exists(args.in_file):
    print('--in-file argument', args.in_file, 'does not exist', flush=True)
    exit(1)

if not os.path.exists(args.out_directory):
    os.makedirs(args.out_directory)

# load and parse the input file
print('Loading parquet file', args.in_file, flush=True)
table = pq.read_table(args.in_file)
data = table.to_pandas()

features = []

# we group by label and then extract some metrics
for label in data.label.unique():
    data_per_label = data[data.label == label]

    # calculate the RMS per axis
    features.append({
        'label': label,
        'rmsX': np.sqrt(np.mean(data_per_label.accX**2)),
        'rmsY': np.sqrt(np.mean(data_per_label.accY**2)),
        'rmsZ': np.sqrt(np.mean(data_per_label.accZ**2))
    })

# and store as new file in the output directory
out_file = os.path.join(args.out_directory, os.path.splitext(os.path.basename(args.in_file))[0] + '_features.parquet')
pd.DataFrame(features).to_parquet(out_file)

print('Written features file', out_file, flush=True)

2.4 - Building and testing the container

On your local machine

To test the transformation block locally, if you have Python and all dependencies installed, just run:

$ python3 transform.py --in-file gestures.parquet --out-directory out/

Docker

You can also build the container locally via Docker, and test the block. The added benefit is that you don't need any dependencies installed on your local computer, and can thus test that you've included everything that's needed for the block. This requires Docker desktop to be installed.

To build the container and test the block, open a command prompt or terminal window and navigate to the source directory. First, build the container:

$ docker build -t test-org-transform-parquet-dataset .

Then, run the container (make sure gestures.parquet is in the same directory):

$ docker run --rm -v $PWD:/data test-org-transform-parquet-dataset --in-file /data/gestures.parquet --out-directory /data/out

Seeing the output

This process has generated a new Parquet file in the out/ directory containing the RMS of the X, Y and Z axes. If you inspect the content of the file (e.g. using parquet-tools) you'll see the output:

$ parquet-tools head -n5 out/gestures_features.parquet 
label = wave
rmsX = 11.424144744873047
rmsY = 4.73303747177124
rmsZ = 2.944265842437744

label = updown
rmsX = 3.899503231048584
rmsY = 3.9587674140930176
rmsZ = 10.34404468536377

label = circle
rmsX = 6.263721942901611
rmsY = 7.0987162590026855
rmsZ = 6.159618854522705

label = idle
rmsX = 3.714001178741455
rmsY = 3.4940428733825684
rmsZ = 8.6710205078125

label = snake
rmsX = 1.282995581626892
rmsY = 1.8830623626708984
rmsZ = 9.597149848937988

Success!

3. Pushing the transformation block to Edge Impulse

With the block ready we can push it to your organization. Open a command prompt or terminal window, navigate to the folder you created earlier, and run:

$ edge-impulse-blocks push

This packages up your folder, sends it to Edge Impulse where it'll be built, and finally is added to your organization.

Edge Impulse Blocks v1.9.0
Archiving 'tutorial-processing-block'...
Archiving 'tutorial-processing-block' OK (2 KB) /var/folders/3r/fds0qzv914ng4t17nhh5xs5c0000gn/T/ei-transform-block-7812190951a6038c2f442ca02d428c59.tar.gz

Uploading block 'Demo dataset transformation' to organization 'Demo org Inc.'...
Uploading block 'Demo dataset transformation' to organization 'Demo org Inc.' OK

Building transformation block 'Demo dataset transformation'...
Job started
...
Building transformation block 'Demo dataset transformation' OK

Your block has been updated, go to https://studio.edgeimpulse.com/organization/34/data to run a new transformation

The transformation block is now available in Edge Impulse under Data transformation > Transformation blocks.

If you make any changes to the block, just re-run edge-impulse-blocks push and the block will be updated.

4. Uploading gestures.parquet to Edge Impulse

Next, upload the gestures.parquet file, by going to Data > Add data... > Add data item, setting name as 'Gestures', dataset to 'Transform tutorial', and selecting the Parquet file.

This makes the gestures.parquet file available from the Data page.

5. Starting the transformation

With the Parquet file in Edge Impulse and the transformation block configured you can now create a new job. Go to Data, and select the Parquet file by setting the filter to dataset = 'Transform tutorial'.

Click the checkbox next to the data item, and select Transform selected (1 file). On the 'Create transformation job' page select 'Import data into Dataset'. Under 'output dataset', select 'Same dataset as source', and under 'Transformation block' select the new transformation block.

Click Start transformation job to start the job. This pulls the data in, starts a transformation job and finally uploads the data back to your dataset. If you have multiple files selected the transformations will also run in parallel.

You can now find the transformed file back in your dataset:

6. Next steps

Appendix: Advanced features

Updating metadata from a transformation block

You can update the metadata of blocks directly from a transformation block by creating a ei-metadata.json file in the output directory. The metadata is then applied to the new data item automatically when the transform job finishes. The ei-metadata.json file has the following structure:

{
    "version": 1,
    "action": "add",
    "metadata": {
        "some-key": "some-value"
    }
}

Some notes:

  • If action is set to add the metadata keys are added to the data item. If action is set to replace all existing metadata keys are removed.

Environmental variables

Transformation blocks get access to the following environmental variables, which let you authenticate with the Edge Impulse API. This way you don't have to inject these credentials into the block. The variables are:

  • EI_API_KEY - an API key with 'member' privileges for the organization.

  • EI_ORGANIZATION_ID - the organization ID that the block runs in.

  • EI_API_ENDPOINT - the API endpoint (default: https://studio.edgeimpulse.com/v1).

Custom parameters

[{
    "name": "Bucket",
    "type": "bucket",
    "param": "bucket-name",
    "value": "",
    "help": "The bucket where you're hosting all data"
},
{
    "name": "Bucket prefix",
    "value": "my-test-prefix/",
    "type": "string",
    "param": "bucket-prefix",
    "help": "The prefix in the bucket, where you're hosting the data"
}]

Renders the following UI when you run the transformation block:

And the options are passed in as command line arguments to your block:

--bucket-name "ei-data-dev" --bucket-prefix "my-test-prefix/"

Transformation blocks are a powerful feature which let you set up a data pipeline to turn raw data into actionable machine learning features. It also gives you a reproducible way of transforming many files at once, and is programmable through the so you can automatically convert new incoming data. If you're interested in transformation blocks or any of the other enterprise features,

You can specify custom arguments or parameters to your block by adding a file in the root of your block directory. This file describes all arguments for your training pipeline, and is used to render custom UI elements for each parameter. For example, this parameters file:

For more information, and all options see .

🚀
organizational datasets
synchronizing clinical data with a bucket
Enterprise Trial
Edge Impulse CLI
gestures.parquet
Continuous gestures
Docker desktop
Edge Impulse API
let us know!
parameters.json
Adding parameters to custom blocks
The transformation block in Edge Impulse
Selecting the transform tutorial dataset
Configuring the transformation job
Dataset transformation running
Dataset transformation successful
Running a transformation block with custom parameters