Project transformation block

Transformation blocks take raw data from your organizational datasets and convert the data into files that can be loaded in an Edge Impulse project. You can use transformation blocks to only include certain parts of individual data files, calculate long-running features like a running mean or derivatives, or efficiently generate features with different window lengths. Transformation blocks can be written in any language, and run on the Edge Impulse infrastructure.

In this tutorial we build a Python-based transformation block that loads Parquet files, splits the data into thirty second windows, and uploads the data to a new project.

Want more? We also have an end-to-end example transformation block that mixes noise into an audio dataset: edgeimpulse/example-transform-block-mix-noise.

1. Prerequisites

You'll need:

  • The Edge Impulse CLI.
    • If you receive any warnings that's fine. Run edge-impulse-blocks afterwards to verify that the CLI was installed correctly.
  • The gestures.parquet file which you can use to test the transformation block. This contains some data from the Continuous gestures dataset in Parquet format.

Transformation blocks use Docker containers, a virtualization technique which lets developers package up an application with all dependencies in a single package. If you want to test your blocks locally you'll also need (this is not a requirement):

1.1 - Parquet schema

This is the Parquet schema for the gestures.parquet file which we'll want to transform into data for a project:

message root {
  required binary sampleName (UTF8);
  required int64 timestamp (TIMESTAMP_MILLIS);
  required int64 added (TIMESTAMP_MILLIS);
  required boolean signatureValid;
  required binary device (UTF8);
  required binary label (UTF8);
  required float accX;
  required float accY;
  required float accZ;
}

2. Building your first transformation block

To build a transformation block open a command prompt or terminal window, create a new folder, and run:

$ edge-impulse-blocks init

This will prompt you to log in, and enter the details for your block. E.g.:

Edge Impulse Blocks v1.9.0
? What is your user name or e-mail address (edgeimpulse.com)? [email protected]
? What is your password? [hidden]
Attaching block to organization 'Demo org Inc.'
? Choose a type of block Transformation block
? Choose an option Create a new block
? Enter the name of your block Demo project transformation
? Enter the description of your block Reads a Parquet file and splits it up in labeled data
Creating block with config: {
  name: 'Demo project transformation',
  type: 'transform',
  description: 'Reads a Parquet file and splits it up in labeled data',
  organizationId: 34
}
Your new block 'Demo project transformation' has been created in '~/repos/tutorial-processing-block'.
When you have finished building your transformation block, run "edge-impulse-blocks push" to update the block in Edge Impulse.

Then, create the following files in this directory:

2.1 - Dockerfile

We're building a Python based transformation block. The Dockerfile describes our base image (Python 3.7.5), our dependencies (in requirements.txt) and which script to run (transform.py).

FROM python:3.7.5-stretch

WORKDIR /app

# Python dependencies
COPY requirements.txt ./
RUN pip3 --no-cache-dir install -r requirements.txt

COPY . ./

ENTRYPOINT [ "python3",  "transform.py" ]

Note: Do not use a WORKDIR under /home! The /home path will be mounted in by Edge Impulse, making your files inaccessible.

🚧

ENTRYPOINT vs RUN / CMD

If you use a different programming language, make sure to use ENTRYPOINT to specify the application to execute, rather than RUN or CMD.

2.2 - requirements.txt

This file describes the dependencies for the block. We'll be using pandas and pyarrow to parse the Parquet file, and numpy to do some calculations.

numpy==1.16.4
pandas==0.23.4
pyarrow==0.16.0

2.3 - transform.py

This file includes the actual application. Transformation blocks are invoked with three parameters (as command line arguments):

  • --in-file - A file from the organizational dataset. In this case the gestures.parquet file.
  • --out-directory - Directory to write files to.
  • --hmac-key - You can use this HMAC key to sign the output files.

Add the following content:

import pyarrow.parquet as pq
import numpy as np
import math, os, sys, argparse, json, hmac, hashlib, time
import pandas as pd

# these are the three arguments that we get in
parser = argparse.ArgumentParser(description='Organization transformation block')
parser.add_argument('--in-file', type=str, required=True)
parser.add_argument('--out-directory', type=str, required=True)
parser.add_argument('--hmac-key', type=str, required=True)

args, unknown = parser.parse_known_args()

# verify that the input file exists and create the output directory if needed
if not os.path.exists(args.in_file):
    print('--in-file argument', args.in_file, 'does not exist', flush=True)
    exit(1)

if not os.path.exists(args.out_directory):
    os.makedirs(args.out_directory)

# load and parse the input file
print('Loading parquet file', args.in_file, flush=True)
table = pq.read_table(args.in_file)
data = table.to_pandas()

# we'll split data based on the "label" Parquet column
# keep track of the last label and if it changes we write a file
window = []
last_label = data['label'][0]
interval_ms = (data['timestamp'][1] - data['timestamp'][0]).microseconds / 1000

# turn a window into a file that is in the Edge Impulse Data Acquisition format
def window_to_file(window):
    # window under 5 seconds: skip
    if (len(window) * interval_ms < 5000):
        return

    # take the label, timestamp and sensors
    label = window[0]['label']
    timestamp = int(pd.to_datetime(window[0]['timestamp']).value / 100000000)
    sensors = ['accX', 'accY', 'accZ']
    values = []
    for w in window:
        values.append([ w[x] for x in sensors ])

    # basic structure
    struct = {
        'protected': {
            'ver': 'v1',
            'alg': 'HS256',
            'iat': int(timestamp)
        },
        'signature': '0000000000000000000000000000000000000000000000000000000000000000',
        'payload': {
            'device_type': 'Importer',
            'interval_ms': interval_ms,
            'sensors': [ { 'name': x, 'units': 'm/s2' } for x in sensors ],
            'values': values
        }
    }

    # sign the structure
    encoded = json.dumps(struct)
    signature = hmac.new(bytes(args.hmac_key, 'utf-8'), msg = encoded.encode('utf-8'), digestmod = hashlib.sha256).hexdigest()
    struct['signature'] = signature

    # and write to the output directory
    file_name = os.path.join(args.out_directory, label + '.' + str(timestamp) + '.json')
    with open(file_name, 'w') as f:
        json.dump(struct, f)

# loop over all rows in the Parquet file
for index, row in data.iterrows():
    # file changes, or longer than 10 seconds? write file
    if ((last_label != row['label']) or (len(window) * interval_ms > 10000)):
        print('writing file', str(index) + '/' + str(10000), row['label'], flush=True)
        window_to_file(window)
        window = []

    last_label = row['label']
    window.append(row)

# write the last window
window_to_file(window)

2.4 - Testing the transformation block locally

On your local machine

To test the transformation block locally, if you have Python and all dependencies installed, just run:

$ python3 transform.py --in-file gestures.parquet --out-directory out/ --hmac-key 123

This generates a number of JSON files in the out/ directory. You can test the import into an Edge Impulse project via:

$ edge-impulse-uploader --clean out/*.json

Docker

You can also build the container locally via Docker, and test the block. The added benefit is that you don't need any dependencies installed on your local computer, and can thus test that you've included everything that's needed for the block. This requires Docker desktop to be installed.

First, build the container:

$ docker build -t test-org-transform-parquet .

Then, run the container (make sure gestures.parquet is in the same directory):

$ docker run --rm -v $PWD:/data test-org-transform-parquet --in-file /data/gestures.parquet --out-directory /data/out --hmac-key 0123

This generates a number of JSON files in the out/ directory. You can test the import into an Edge Impulse project via:

$ edge-impulse-uploader --clean out/*.json

3. Pushing the transformation block to Edge Impulse

With the block ready we can push it to your organization. Open a command prompt or terminal window, navigate to the folder you created earlier, and run:

$ edge-impulse-blocks push

This packages up your folder, sends it to Edge Impulse where it'll be built, and finally is added to your organization.

Edge Impulse Blocks v1.9.0
Archiving 'tutorial-processing-block'...
Archiving 'tutorial-processing-block' OK (2 KB) /var/folders/3r/fds0qzv914ng4t17nhh5xs5c0000gn/T/ei-transform-block-7812190951a6038c2f442ca02d428c59.tar.gz

Uploading block 'Demo project transformation' to organization 'Demo org Inc.'...
Uploading block 'Demo project transformation' to organization 'Demo org Inc.' OK

Building transformation block 'Demo project transformation'...
Job started
...
Building transformation block 'Demo project transformation' OK

Your block has been updated, go to https://studio.edgeimpulse.com/organization/34/data to run a new transformation

The transformation block is now available in Edge Impulse under Data transformation > Transformation blocks.

The transformation block in Edge ImpulseThe transformation block in Edge Impulse

The transformation block in Edge Impulse

If you make any changes to the block, just re-run edge-impulse-blocks push and the block will be updated.

4. Uploading gestures.parquet to Edge Impulse

Next, upload the gestures.parquet file, by going to Data > Add data... > Add data item, setting name as 'Gestures', dataset to 'Transform tutorial', and selecting the Parquet file.

This makes the gestures.parquet file available from the Data page.

5. Starting the job

With the Parquet file in Edge Impulse and the transformation block configured you can now create a new job. Go to Data, and select the Parquet file by setting the filter to dataset = 'Transform tutorial'.

Selecting the transform tutorial datasetSelecting the transform tutorial dataset

Selecting the transform tutorial dataset

Click the checkbox next to the data item, and select Transform selected (1 file). On the 'Create transformation job' page, select 'Import data into Project'. Then under 'Project', select '+ Create new project' and enter a name. Under 'Transformation block' select the new transformation block.

Configuring the transformation jobConfiguring the transformation job

Configuring the transformation job

Click Start transformation job to start the job. This pulls the data in, starts a transformation job and finally imports the data into the new project. If you have multiple files selected the transformations will also run in parallel.

A transformation job with a transformation blockA transformation job with a transformation block

A transformation job with a transformation block

6. Next steps

Transformation blocks are a powerful feature which let you set up a data pipeline to turn raw data into actionable machine learning features. It also gives you a reproducible way of transforming many files at once, and is programmable through the Edge Impulse API so you can automatically convert new incoming data. Want more? We also have an end-to-end example transformation block that mixes noise into an audio dataset: edgeimpulse/example-transform-block-mix-noise.

If you're interested in transformation blocks or any of the other enterprise features, let us know!

:rocket:


Did this page help you?