1 of 16

Organization hub

Your Edge Impulse Organization enables your team to collaborate on multiple datasets, automation, and models in a shared workspace. It provides tools to automate data preparation tasks with reusable pipelines, enabling data transformation, preparation, and analysis of sensor data at scale. Allowing anyone in your team to quickly access relevant data through familiar tools, add versions and add traceability to your machine learning models, and lets you quickly create and monitor your Edge Impulse projects for optimal on-device performance.

Only available with Edge Impulse Enterprise Plan

Try our FREE today.

To get started, follow these guides:

Health reference design

Usage metrics

Existing enterprise users or enterprise trial users can view their entitlement limits via the dashboard of their enterprise organization:

This view allows you to see your organization's current usage of total users, projects, compute time and storage limits. To increase your organization's limits, select the Request limit increase button to contact sales.

Users

Within an you can work on one or more projects with multiple people. These can be colleagues, outside researchers, or even members of the community. They will only get access to the specific data in the project, and not to any of the raw data in your organizational datasets.

Only available with Edge Impulse Enterprise Plan

Try our FREE today.

To invite a user in an organization, click on the "Add user button, enter the email address and select the role:

Organization Users vs Project Users

It is important to note that there are two types of users in Edge Impulse: Project Users and Organization Users.

Organization Users, typically holding roles like Admin, are responsible for the overarching management and customization of organizational elements, including datasets, storage buckets, and white label attributes. These users also encompass the capabilities of Project Users.

Conversely, Project Users, often in roles such as Member or Guest, are limited to specific project involvement, focusing on collaboration and contributions at the project level, without access to broader organizational management functions. They are granted access only to certain project data to maintain the security of raw data in organizational datasets.

Organization User Roles

For a more granular look at the capabilities of each role, see the table below:

Admin

Admins have full rights on the organization, overseeing organizational and white label functionalities, including dataset management and storage bucket updates. They also have all the rights of a Project Member.

Full Rights on the Organization
Project User rights
Manage organization datasets
Update and add storage buckets
Verify bucket connectivity
Customize white label (where applicable) attributes like themes and information
API access for organization and white label management

Member

Members have full access on the datasets, custom blocks but cannot join a project without being invited.

Broad Access, with Restrictions on Project Joining
Project User rights
Full access to datasets and custom blocks
Can collaborate on projects, but only by invitation
Can access metrics via API

Guest

Guests have restricted access, limited to selected datasets within the projects they are associated with.

Limited Access to Selected Datasets
Project User rights
Access to selected datasets within the project they are invited to
Cannot access raw data in organizational datasets
Cannot access metrics via API

Data campaigns

The "data campaigns" feature allows you to quickly track your experiments and your models' development progresses. It is an overview of your pipelines where you can easily extract useful information from your datasets and correlate those metrics with your model performances.

It has been primarily designed to follow clinical research data processes. In August 2023, we released this feature for every enterprise user as we see value in being able to track metrics between your datasets and your projects.

Only available with Edge Impulse Enterprise Plan

Try our FREE Enterprise Trial today.

Setting up your dashboard

To get started, navigate to the Data campaigns tab in your organization:

Click on + Create new dashboard.

Give your dashboard a name, and select one or more collaborators to receive the daily updates by email. If you don't want to be spammed, you can select when you want to receive these updates, either Always, On new data, changes or on error, or Never. Finally, set the last number of days shown in the graphs:

You can create as many dashboards as needed, simply click on + Create a new dashboard from the dropdown available under your current dashboard:

If you want to delete a dashboard, Click on Actions... -> Delete dashboard

Setting up your campaign

Once your dashboard is created, you can add your custom campaigns. It's where you will specify which metrics are important to you and your use case. Click on Actions... -> Add campaign

Fill the form to create your campaign:

Name: Name of your data campaign.

Description: Description of your data campaign.

Campaign coordinators: Add the collaborators that are engaged with this campaign

Datasets: Select the datasets you want to visualize in your campaign. You can add several datasets.

Projects: Select the projects you want to visualize in your campaign. You can add several projects.

Pipelines: Select the pipeline that is associated with your campaign. Note that this is for reference only, it is currently not displayed in your campaign

Links: Select between the link type you need. Options are Github, Spreadsheet, Text Document, Code repository, List or Folder. Add a name and the link. This place is useful for other collaborators to have all the needed information about your project, gathered in one place under your campaign.

Addition queries to track: These queries are data filters that need to be written in the SQL WHERE format. See Querying data for more information. For example metadata->age >= 18` will return the data samples from adult patients.

You can then save your data campaign and it will be added to your dashboard:

This dashboard shows the metrics' progress from the Health reference design data

If you want to edit or delete your campaign, click on the "⋮" button on the right side of your campaign:

Data

Since the creation of Edge Impulse, we have been helping customers to deal with complex data pipelines, complex data transformation methods and complex clinical validation studies.

In most cases, before even thinking about machine learning algorithms, researchers need to build quality datasets from real-world data. These data come from various devices (prototype devices being developed vs clinical/industrial-grade reference devices), have different formats (excel sheets, images, csv, json, etc...), and be stored in various places (researchers' computers, Dropbox folders, Google Drive, S3 buckets, etc...).

Dealing with such complex data infrastructure is time-consuming and expensive to develop and maintain. With the organizational data, we want to give you tools to centralize, validate and transform datasets so they can be easily imported into your projects to train your machine learning models.

Only available with Edge Impulse Enterprise Plan

Try our FREE Enterprise Trial today.

Health reference design

We have built a health reference design that describes an end-to-end ML workflow for building a wearable health product using Edge Impulse.

In this reference resign, we want to help you understand how to create a full clinical data pipeline by:

Synchronizing clinical data with a bucket
Validating clinical data
Querying clinical data
Transforming clinical data
Buildling data pipelines

Buckets

Before we get started, you must link your organization with one or several storage buckets. First, select where your data lives:

AWS S3 buckets
Google Cloud Storage
Any S3-compatible bucket

And fill the form with your bucket name, region, endpoint access and secret keys:

A green dot indicates that your bucket is connected:

Datasets

Two types of dataset structures can be used - Generic datasets (default) and Clinical datasets.

There is no required format for data files. You can upload data in any format, whether it's CSV, Parquet, or a proprietary data format.

However, to import data items to an Edge Impulse project, you will need to use the right format as our studio ingestion API only supports these formats:

JPG, PNG images
MP4, AVI video files
WAV audio files
CBOR/JSON files in the Edge Impulse data acquisition format
CSV files

Tip: You can use transformation blocks to convert your data

The default dataset structure is a file-based one, no matter the directory structure:

For example:

images/
├── testing/
│   ├── 1.jpg
│   ├── 2.jpg
│   ├── 3.jpg
│   ...
│   └── 200.jpg
└── training/
    ├── 1.jpg
    ├── 2.jpg
    ├── 3.jpg
    ...
    └── 800.jpg

or:

keywords/
├── french-accent/
│   ├── hello.wav
│   ├── yes.wav
│   ├── no.wav
├── greek-accent/
│   ├── hello.wav
│   ├── yes.wav
│   ├── no.wav
└── unlabeled/
    ├── 1.wav
    ├── 2.wav
    ├── 3.wav
    ...
    └── 20.wav

Note that you will be able to associate the labels of your data items from the file name or the directory name when importing your data in a project.

The clinical datasets structure in Edge Impulse has three layers:

The dataset, a larger set of data items, grouped together.
Data item, an item with metadata and files attached.
Data file, the actual files.

See the health reference design tutorial for a deeper explanation.

Create a new dataset

Once you successfully linked your storage bucket to your organization, head to the Datasets tab and click on + Add new dataset:

Fill out the following form:

Click on Create dataset

Data

With your datasets imported, you can now navigate into your dataset, create folders, query your dataset, add data items and import your data to an Edge Impulse project.

Default view

The default view lets you navigate in your bucket following the directory structure. You can easily add data using the "+ New folder" button. To add new data, use the right panel - drag and drop your files and folders and it will automatically upload them to your bucket.

Clinical view

The clinical view is slightly different, see synchronizing clinical data with a bucket for more information. This view lets you easily query your clinical dataset but to import data, you will need to set up an upload portal or upload them directly to your bucket.

Tip: You can add two distinct datasets in Edge Impulse that point to the same bucket path, one generic and one clinical. This way you can leverage both the easy upload and the ability to query your datasets.

Adding data to your project

Go to the Actions...->Import data into a project, select the project you wish to import to and click Next, Configure how to label this data:

This will import the data into the project and optionally create a new label for each file in the dataset. This labeling step helps you keep track of different classes or categories within your data.

After importing the data into the project, in the Next, post-sync actions step, you can configure a data pipeline to automatically retrieve and trigger actions in your project:

Previewing Data

We also have added a data preview feature, allowing you to visualize certain types of data directly within the organization data tab.

Supported data types include tables (CSV/Parquet), images, PDFs, audio files (WAV/MP3), and text files (TXT/JSON). This feature gives you a quick overview of your data and helps ensure its integrity and correctness.

Recap

Any questions, or interested in the enterprise version of Edge Impulse? Contact us for more information.

Troubleshooting

CORS Headers

If you see the following message, make sure to add the CORS header to your bucket settings:

You can also add the CORS using the AWS S3 CLI:

aws s3api put-bucket-cors --bucket your-bucket --cors-configuration file://cors.json

with this file cors.json:

{
      "CORSRules": [
        {
            "AllowedHeaders": ["*"],
            "AllowedMethods": ["PUT", "POST"],
            "AllowedOrigins": ["https://studio.edgeimpulse.com"],
            "ExposeHeaders": []
        }
    ]
}

Data transformation

Data transformation or transformation jobs refer to processes that apply specific transformations to the data within an Edge Impulse organizational dataset. These jobs are executed using Transformation blocks, which are essentially scripts packaged in Docker containers. They perform a variety of tasks on the data, enabling more advanced and customized dataset transformation and manipulation.

The transformation jobs can be chained together in Data pipelines to automate your workflows.

Only available with Edge Impulse Enterprise Plan

Try our FREE Enterprise Trial today.

Overview

Transformation jobs

Create a transformation job

You have several options to create a transformation job:

From the Data transformation page by selecting the Create job tab.
From the Custom blocks->Transformation page by selecting the "⋮" action button and selecting Run job.
From the Data page:

Depending on whether you are on a Default dataset or a Clinical dataset, the view will vary:

Run a transformation job

Again, depending on whether you are on a Default dataset or a Clinical dataset, the view will vary. The common options are the Name of the transformation job, the Transformation block used for the job.

If your Transformation block has additional custom parameters, the input fields will be displayed below in a Parameters section. For example:

Dataset type options:

Default vs. Clinical datasets

Clinical Datasets: Operate on "data items" with a strict file structure. Transformation is specified using SQL-like syntax.

Default Datasets: Resemble a typical file system with flexible structure. You can specify data for transformation using wildcards.

For more information about the two dataset types, see the dedicated Data page.

Input

After selecting your Input dataset, you can filter which files or directory you want to transform.

In default dataset formats, we use wildcard filters (in a similar format to wildcards in git). This enable you to specify patterns that match multiple files or directories within your dataset:

Asterisk ( * ): Represents any number of characters (including zero characters) in a filename or directory name. It is commonly used to match files of a certain type or files whose names follow a pattern.
Example: /folder/*.png matches all PNG files in the /folder directory.
Example: /data/*/results.csv matches any results.csv file in a subdirectory under /data.
Double Asterisk ( ** ): Used to match any number of directories, including nested ones. This is particularly useful when the structure of directories is complex or not uniformly organized.
Example: /data/**/experiment-* matches all files or directories starting with experiment- in any subdirectory under /data.

Output

When you work with default datasets in Edge Impulse, you have the flexibility to define how the output from your transformation jobs is structured. There are three main rules to choose from:

No Subfolders: This rule places all transformed files directly into your specified output directory, without creating any subfolders. For example, if you transform .txt files in /data and choose /output as your output directory, all transformed files will be saved directly in /output.
Subfolder per Input Item: Here, a new subfolder is created in the output directory for each input file or folder. This keeps the output from each item organized and separate. For instance, if your input includes folders like /data/2020, /data/2021, and /data/2022, and you apply this rule with /transformed as your output directory, you will get subfolders like /transformed/2020, /transformed/2021, and /transformed/2022, each containing the transformed data from the corresponding input year.
Use Full Path: This rule mirrors the entire input path when creating new sub-folders in the output directory. It's especially useful for maintaining a clear trace of where each piece of output data originated, which is important in complex directory structures. For example, if you're transforming files in /project/data/experiments, and you choose /results as your output directory, the output will follow the full input path, resulting in transformed data being stored in /results/project/data/experiments.

Note: For the transformation blocks operating on files when selecting the Subfolder or Full Path option, we will use the file name without extension to create the base folder. e.g. /activity-detection/Accelerometer.csv will be uploaded to /activity-detection-output/Accelerometer/.

Input

When running transformation jobs using the Clinical dataset option, you can query your input files or folders in all your clinical datasets. We use a different filtering mechanism for the Clinical datasets.

Filters

You can use a language which is very similar to SQL (documentation). See more on how to query your data on the dedicated documentation page. For example you can use filters like the following:

dataset = 'Activity Detection (Clinical view)' AND file_name like 'Accelero%'
dataset = 'Activity Detection (Clinical view)' AND metadata->ei_check = 1

Import into project

Import into dataset

Number of parallel jobs

For transformation jobs operating on Data items (directory) or on Files, you can edit the number of parallel jobs to run simultaneously

Users to notify

Finally, you can select users you want to notify over email when this job finishes.

Upload portals

Upload portals are a secure way to let external parties upload data to your datasets. Through an upload portal they get an easy user interface to add data, but they have no access to the content of the dataset, nor can they delete any files. Data that is uploaded through the portal can be stored on-premise or in your own cloud infrastructure.

In this tutorial we'll set up an upload portal, show you how to add new data, and how to show this data in Edge Impulse for further processing.

Only available with Edge Impulse Enterprise Plan

Try our FREE today.

1. Configuring a storage bucket

Data is stored in storage buckets, you can either use:

AWS S3 buckets
Google Cloud Storage
Any S3-compatible bucket

See .

2. Creating an upload portal

With your storage bucket configured you're ready to set up your first upload portal. In your organization go to Data > Upload portals and choose Create new upload portal. Here, select a name, a description, the storage bucket, and a path in the storage bucket.

Note: You'll need to enable CORS headers on the bucket. If these are not configured you'll get prompted with instructions. Talk to your user success engineer (when your data is hosted by Edge Impulse), or your system administrator to configure this.

After your portal is created a link is shown. This link contains an authentication token, and can be shared directly with the third party.

Click the link to open the portal. If you ever forget the link: no worries. Click the ⋮ next to your portal, and choose View portal.

3. Uploading data to the portal

To upload data you can now drag & drop files or folders to the drop zone on the right, or use Create new folder to first create a folder structure. There's no limit to the amount of files you can upload here, and all files are hashed, so if you upload a file that's already present the file will be skipped.

Note: Files with the same name but with a different hash are overwritten.

4. Using your portal in transformation blocks / clinical data

Mount the portal directly into a transformation block via Custom blocks > Transformation blocks > Edit block, and select the portal under mount points.

5. Adding the data to your project

6. Recap

Appendix A: Programmatic access to portals

Here's a Python script which uploads, lists and downloads data to a portal. To upload data you'll need to authenticate with a JWT token, see below this script for more info.

And here's a script to generate JWT tokens:

Custom blocks

Custom blocks are cloud jobs that can be hosted and used on Edge Impulse. They serve a dedicated task, are extremely flexible, let you customize your experience and fasten your time-to-market.

Creating a transformation block - to fetch, sort, validate, combine and transform existing data into robust datasets that can be imported into your projects.
Building and hosting custom DSP blocks - to create and host your custom signal processing techniques and use them directly in your projects.
Create a custom learning block - to use your custom models and load pre-trained weights with PyTorch, Keras or scikit-learn.
Building deployment blocks - to create custom deployment targets for your products.

Transformation blocks

Transformation blocks are very flexible and can be used for most advanced use cases.

They can either take raw data from your organizational datasets and convert the data into files that can be loaded in an Edge Impulse project/another organizational dataset. But you can also use the transformation blocks as cloud jobs to perform specific actions using standalone mode.

Transformation blocks are available in your organization pipelines and in your project pipelines so you can automate your processes.

You can use transformation blocks to fetch external datasets, augment/create variants of your data samples, generate synthetic datasets, extract metadata from config files, create helper graphs, align and interpolate measurements across sensors, or remove duplicate entries. The possibilities are endless.

Transformation blocks can be written in any language, and run on Edge Impulse infrastructure.

Only available with Edge Impulse Enterprise Plan

Try our FREE Enterprise Trial today.

Transformation blocks can be complex to set up and are one of the most advanced features Edge Impulse provides. Feel free to ask your customer solution engineer for some help and some examples, we have been setting up complex pipelines for our customers and our engineers have acquired a lot of expertise with transformation blocks.

Run transformation blocks

You can run your transformation blocks as transformation jobs. They can be triggered:

from your organization:

From this view, Custom blocks->Transformation
From the Data transformation
From the Data pipelines

from your projects:

From the Data sources (Standalone transformation blocks only)
From Synthetic data (Synthetic data blocks only)

Public blocks

By default, we provide several pre-built transformation blocks that you can use directly in your organization or your organization's projects.

We will add more over time when we see a recurring need or interest. The current ones are the following:

Understanding the transformation blocks

A transformation block consists of a Docker image that contains one or several scripts. The Docker image is encapsulated in the transformation block with additional parameters.

Here is a minimal configuration for the transformation blocks:

In this documentation page, we will explain how to setup a transformation block and will explain the different options.

Import existing transformation blocks

You can directly create your transformation block within Edge Impulse Studio from a public Docker image or import existing transformation blocks:

Example repository

You can find several transformation block examples in this Github repository. These are a great way to get started, either by importing them directly in your organization or by using them as a getting-started template.

To run the data transformation jobs, see the Data transformation documentation page.

Setting up transformation blocks

To setup your block, an easy method is to use the Edge Impulse CLI command, edge-impulse-blocks init:

$> edge-impulse-blocks init 

Edge Impulse Blocks v1.21.1
? In which organization do you want to create this block? 
❯ Developer Relations 
  Medical Laboratories inc.
  Demo Team 
Attaching block to organization 'Developer Relations'
? Choose a type of block
❯ Transformation block 
  Deployment block 
  DSP block 
  Machine learning block
? Choose an option: 
❯ Create a new block 
  Update an existing block 
? Enter the name of your block: Generate helper graphs from sensor CSV
? Enter the description of your block: Transformation block to help you visualize what how your sensor time series data look like by creating a graph from the CSV files
? What type of data does this block operate on? 
  File (--in-file passed into the block) 
  Data item (--in-directory passed into the block) 
❯ Standalone (runs the container, but no files / data items passed in)
? Which buckets do you want to mount into this block (will be mounted under /mnt/s3fs/BUCKET_NAME, you can change these mount points in the Studio)?
(Press <space> to select, <a> to toggle all, <i> to invert selection)
❯ ◉ edge-impulse-devrel-team
  ◯ ei-datasets
❯ yes 
  no

Tip: If you want to access your bucket, make sure to press <space> to select the bucket attached to your organization.

The step above will create the following .parameters.json file in your project directory:

{
    "version": 1,
    "type": "transform",
    "info": {
        "name": "Generate helper graphs from sensor CSV",
        "description": "Transformation block to help you visualize what how your sensor time series data look like by creating a graph from the CSV files",
        "operatesOn": "standalone",
        "transformMountpoints": [
            {
                "bucketId": 3096,
                "mountPoint": "/mnt/s3fs/edge-impulse-devrel-team"
            }
        ]
    },
    "parameters": []
}

To push your transformation block, simply run edge-impulse-blocks push.

Dockerfile

At Edge Impulse, we mostly use Python, Javascript/Typescript and Bash scripts, but you can write your transformation blocks in any language.

Dockerfile example to trigger a Bash script:

FROM ubuntu:20.04

WORKDIR /app

# Copy the bash script into the container
COPY hello.sh /hello.sh

# Make the bash script executable
RUN chmod +x /hello.sh

# Set the entrypoint command to run the script with the provided --name argument
ENTRYPOINT ["/hello.sh"]

Dockerfile example to trigger a Python script and install the required dependencies:

FROM python:3.7.5-stretch

WORKDIR /app

# Python dependencies
COPY requirements.txt ./
RUN pip3 --no-cache-dir install -r requirements.txt

COPY . ./

ENTRYPOINT [ "python3",  "transform.py" ]

The Dockerfile above describes a base image (Python 3.7.5), the Python dependencies (in requirements.txt) and which script to run (transform.py).

Note: Do not use a WORKDIR under /home! The /home path will be mounted in by Edge Impulse, making your files inaccessible.

ENTRYPOINT vs RUN / CMD

If you create a custom Dockerfile, make sure to use ENTRYPOINT to specify the application to execute, rather than RUN or CMD.

If you want to host your docker image on an external registry, you can use Docker Hub and use the username/image:tag in the Docker container field.

Operation modes

We provide three modes to access your data:

In the Standalone mode, no data is passed to the container, but you can still access data by mounting your bucket onto the container.
At the Data item level, we pass the --in-directory and --out-directory arguments. The transformation jobs will run on each directory present in your selected path. These jobs can run in parallel.
At the file level, we pass the --in-file and --out-directory arguments. The transformation jobs will run on each file present in your selected path. These jobs can run in parallel.

Note that for the two last operation modes, you can use query filters to only include certain data items and certain files.

Standalone

The stand-alone method is the most flexible option (it can work on both generic and clinical datasets). You can consider this transformation block as a cloud job that you can use for anything in your machine learning pipelines.

Please note that this mode does not support running jobs in parallel, as it is unknown in advance how many files or how many directories are present in your dataset.

To access your data, you must mount your bucket/upload portal into the container, you can do this both when setting up your transformation block using Edge Impulse CLI, or directly in the studio when creating/editing a transformation block.

You can use custom blocks parameters to retrieve the bucket name and the required directory to access your files programmatically.

Examples

Python script to create graphs from your CSV sensor Data

e.g. in Python, a script to create graphs from CSV sensor data:

import os, sys, argparse
import pandas as pd
import matplotlib.pyplot as plt

# Set the arguments
parser = argparse.ArgumentParser(description='Organization transformation block')
parser.add_argument('--sensor_name', type=str, required=True, help="Sensor data to extract to create the graph")
parser.add_argument('--bucket_name', type=str, required=False, help="Bucket where your dataset is hosted")
parser.add_argument('--bucket_directory', type=str, required=False, help="Directory in your bucket where your dataset is hosted")

args, unknown = parser.parse_known_args()

sensor_name = args.sensor_name
sensor_file = sensor_name + '.csv'

bucket_name = args.bucket_name
bucket_prefix = args.bucket_directory
mount_prefix = os.getenv('MOUNT_PREFIX', '/mnt/s3fs/')

folder = os.path.join(mount_prefix, bucket_name, bucket_prefix) if bucket_prefix else os.path.join(mount_prefix, bucket_name)

# Check if folder exists
if os.path.exists(folder):
    print('path exist', folder)
    for dirpath, dirnames, filenames in os.walk(folder):
        print("dirpath:",dirpath)
        if os.path.exists(os.path.join(dirpath, sensor_file)):
            print("File exist: ", os.path.join(dirpath, sensor_file))
            
            df = pd.read_csv(os.path.join(dirpath, sensor_file))
            df.index = pd.to_datetime(df['time'], unit='ns')

            # Get a list of all columns except 'time' and 'seconds_elapsed'
            columns_to_plot = [col for col in df.columns if col not in ['time', 'seconds_elapsed']]

            # Create subplots for each selected column
            fig, axes = plt.subplots(nrows=len(columns_to_plot), ncols=1, figsize=(12, 6 * len(columns_to_plot)))

            for i, col in enumerate(columns_to_plot):
                axes[i].plot(df.index, df[col])
                axes[i].set_title(f'{col} over time')
                axes[i].set_xlabel('time')
                axes[i].set_ylabel(col)
                axes[i].grid(True)

            # Save the figure with all subplots in the same directory
            plt.tight_layout()
            print("Graph created")
            plt.savefig(os.path.join(dirpath, sensor_name))

            # Display the plots (optional)
            # plt.show()
        
        else:
            print("file is missing in directory ", dirpath)
     
else:
    print('Path does not exist')
    sys.exit(1)


print("Finished")
exit(0)

Bash script to print "Hello +name" in the log console

e.g. a bash script to print "Hello +name", the name being passed as an argument in the transformation block using the custom block parameters:

#!/bin/bash

while [[ $# -gt 0 ]]; do
  key="$1"

  case $key in
    --name)
      NAME="$2"
      shift # past argument
      shift # past value
      ;;
    *)
      # Unknown option
      echo "Unknown option: $1"
      exit 1
      ;;
  esac
done

echo "Hello $NAME"

Data item (`--in-directory`)

When selecting the Data item operation mode, two parameters will be passed to the container:

--in-directory
--out-directory

The transformation jobs will run on each "Data item" (directory) present in your selected path or dataset.

Example

For example, let's consider a clinical dataset like the following, each data item has several files:

Now let's create a transformation block that simply output the arguments and copy the Accelerometer.csv file to the output dataset. This block is available in the transformation blocks Github repository

Setup the transformation job in the Studio:

You will be able to see logs for each data items looking like the following:

--in-directory:  /data/edge-impulse-devrel-team/datasets/activity-detection/Cycling-2023-09-14_06-47-00/
--out-directory:  /home/transform/26773541
--in-file:  None
in-directory has ['Accelerometer.csv', 'Accelerometer.png', 'Annotation.csv', 'Gravity.csv', 'Gyroscope.csv', 'Location.csv', 'LocationGps.csv', 'LocationNetwork.csv', 'Magnetometer.csv', 'Metadata.csv', 'Orientation.csv', 'Pedometer.csv', 'TotalAcceleration.csv']
out-directory has []
Copying file from  /data/edge-impulse-devrel-team/datasets/activity-detection/Cycling-2023-09-14_06-47-00/ to  /home/transform/26773541
out-directory has now ['Accelerometer.csv']

The copied file will be placed in a temporary directory and then be copied to the desired output dataset respecting the folder structure.

File (`--in-file`)

When selecting the File operation mode, two parameters will be passed to the container:

--in-file
--out-directory

The transformation jobs will run on each file present in selected path.

Example

If we use the same transformation block as above. First, make sure to set the operation mode to File by editing your transformation block:

Then set the transformation job.

We have used the following filter: dataset = 'Activity Detection (Clinical view)' and file_name like '%Gyro%'

Run the jobs and the logs for one file should look like the following:

--in-directory:  None
--out-directory:  /home/transform/26808538
--in-file:  /data/edge-impulse-devrel-team/datasets/activity-detection/Sitting-2023-09-14_09-11-15/Gyroscope.csv
out-directory has []
--in-file path /data/edge-impulse-devrel-team/datasets/activity-detection/Sitting-2023-09-14_09-11-15/Gyroscope.csv exist
coping Gyroscope.csv to /home/transform/26773541
out-directory has now ['Gyroscope.csv']

The copied file will be placed in a temporary directory and then be copied to the desired output dataset respecting the folder structure (along with the file we copied with the previous step using the Data item mode).

Compute requests & limits

When editing your block on Edge Impulse Studio, you can set the number of desired CPUs and the memory needed for your container to run properly. Likely, you can set the limits of the same parameters.

Metadata (Data item and file operation modes)

You can update the metadata of blocks directly from a transformation block by creating a ei-metadata.json file in the output directory. The metadata is then applied to the new data item automatically when the transform job finishes. The ei-metadata.json file has the following structure:

{
    "version": 1,
    "action": "add",
    "metadata": {
        "some-key": "some-value"
    }
}

Some notes:

If action is set to add the metadata keys are added to the data item. If action is set to replace all existing metadata keys are removed.

Mounting points

When using the CLI to setup your block, by default we mount your bucket with the following mounting point:

/mnt/s3fs/your-bucket

You can change this value if you want your transformation block to behave differently.

Custom parameters

See adding parameters to custom blocks dedicated documentation page.

Environmental variables

Transformation blocks get access to the following environmental variables, which let you authenticate with the Edge Impulse API. This way you don't have to inject these credentials into the block. The variables are:

EI_API_KEY - an API key with 'member' privileges for the organization.
EI_ORGANIZATION_ID - the organization ID that the block runs in.
EI_API_ENDPOINT - the API endpoint (default: https://studio.edgeimpulse.com/v1).

Examples & resources

Standalone

Label image data using GPT-4o: Label image data using GPT-4o block
Dall-E Image Generation (Python): Tutorial / GitHub
Text to speech transform block (Javascript): GitHub
Fetch a dataset hosted on Kaggle (Python): Github
Generate graph from sensor csv data (Python): Github
Hello Edge (Bash): Github

File (`--in-file`)

Mix background noise into audio files (Bash script): GitHub
Access your data - Helper transformation block (Python): Github
Resample CSV (Python): Github

Data Item (`--in-directory`)

Access your data - Helper transformation block (Python): Github
Check file existence - Add ei_check metadata on file existence (Python): Github
Merge CSV files - Merge CSV files on a given key (Python): Github
Merge audio and CSV - Merge audio file and time-series CSV (Python): Github

Recap

Now that you have a better idea of what are transformation blocks, here is a graphical recap of how it works:

Parameters.json format

Transformation blocks

This is the specification for the parameters.json file:

type TransformBlockParametersJson = {
    version: 1,
    type: 'transform',
    info: {
        name: string,
        description: string,
        operatesOn: 'file' | 'directory' | 'standalone' | undefined;
        transformMountpoints: {
            bucketId: number;
            mountPoint: string;
        }[] | undefined;
        indMetadata: boolean | undefined;
        cliArguments: string | undefined;
        allowExtraCliArguments: boolean | undefined;
        showInDataSources: boolean | undefined;
        showInCreateTransformationJob: boolean | undefined;
    },
    // see spec in https://docs.edgeimpulse.com/docs/tips-and-tricks/adding-parameters-to-custom-blocks
    parameters: DSPParameterItem[];
};

Synthetic data blocks

This is the specification for the parameters.json file:

type SyntheticDataBlockParametersJson = {
    version: 1,
    type: 'synthetic-data',
    info: {
        name: string,
        description: string,
    },
    // see spec in https://docs.edgeimpulse.com/docs/tips-and-tricks/adding-parameters-to-custom-blocks
    parameters: DSPParameterItem[];
};

Troubleshooting

The job run indefinitely

If you notice that your jobs run indefinitely, it is probably because of an error or the script has not been properly terminated. Make sure to exit your script with code 0 (return 0, exit(0) or sys.exit(0)) for success or with any other error code for failure.

Cannot access files in bucket

If you cannot access your files in your bucket, make sure that the mount point is properly configured.

When using the CLI, it is a common mistake to forget pressing <space> key to select the bucket attached to your organization.

Job failed without logs (only Job failed)

It probably means that we had an issue when triggering the container. In many cases it is related with the issue above, the mount point not being properly configured.

I cannot access the logs

We are still investigating why all the logs are not displayed properly. If you are using Python, you can also flush stdout after you print it using something like print("hello", flush=True).

Can I host my Docker image on Docker Hub?

Yes, you can. You can test this Standalone transformation block if you'd like: luisomoreau/hello_edge:latest

Also, make sure to configure the additional block parameters with this config:

[
    {
        "name": "Name",
        "type": "string",
        "param": "name",
        "value": "",
        "help": "Person to greet"
    }
]

It will print "hello +name" on the transformation job logs.

Deployment blocks

One of the most powerful features in Edge Impulse are the built-in deployment targets (under Deployment in the Studio), which let you create ready-to-go binaries for development boards, or custom libraries for a wide variety of targets that incorporate your trained impulse. You can also create custom deployment blocks for your organization. This lets developers quickly iterate on products without getting your embedded engineers involved, lets your customers build personalized firmware using their own data, or lets you create custom libraries.

In this tutorial you'll learn how to use custom deployment blocks to create a new deployment target, and how to make this target available in the Studio for all users in the organization.

Only available with Edge Impulse Enterprise Plan

Try our FREE today.

Prerequisites

You'll need:

The .
- If you receive any warnings that's fine. Run edge-impulse-blocks afterwards to verify that the CLI was installed correctly.

Deployment blocks use Docker containers, a virtualization technique which lets developers package up an application with all dependencies in a single package. If you want to test your blocks locally you'll also need (this is not a requirement):

installed on your machine.

1. Download example repository

Go to and clone (or download) the repository. Then, open a command prompt or terminal window and run:

To initialize the block.

2. Input to your custom deployment block

When a user deploys with a custom deployment block two things happen:

A package is created that contains information about the deployment (like the sensors used, frequency of the data, etc.), any trained neural network in .tflite and SavedModel formats, the Edge Impulse SDK, and all DSP and ML blocks as C++ code.
This package is then consumed by the custom deployment block, which can incorporate it with a base firmware, or repackage it into a new library.

To test this locally, you can download this package from the Studio. In your Edge Impulse project go to Deployment, and search for Custom block.

Once you click Build you'll receive a ZIP file containing the following items:

trained.tflite - if you have a neural network in the project this contains neural network in .tflite format. This network is already fully quantized if you choose the int8 optimization, otherwise this is the float32 model.
trained.savedmodel.zip - if you have a neural network in the project this contains the full TensorFlow SavedModel. Note that we might update the TensorFlow version used to train these networks at any time, so rely on the compiled model or the TFLite file where possible.
model-parameters - impulse and block configuration in C++ format. Can be used by the SDK to quickly run your impulse.
tflite-model - neural network as source code in a way that can be used by the SDK to quickly run your impulse.

Store all these files under example-custom-deployment-block/input.

2.1 Testing the build script with Docker

To test your deployment block you first build the container, then invoke it with the files from the input directory. Open a command prompt or terminal, navigate to the example-custom-deployment-block folder and:

Build the container:
Invoke the build script - this mounts the current directory in the container under /home, and then passes the downloaded metadata script to the container:

Or if you run Windows or macOS, you can use Docker to run this application:

3. Uploading the deployment block to Edge Impulse

With the deployment block ready you can make it available in Edge Impulse. Open a command prompt or terminal window, navigate to the folder you created earlier, and run:

This packages up your folder, sends it to Edge Impulse where it'll be built, and finally is added to your organization. The transformation block is now available in Edge Impulse under Custom blocks > deployment blocks. You can go here to set the logo, update the description, and set extra command line parameters.

Privileged mode

4. Using the deployment block

The deployment block is automatically available for all organizational projects. Go to the Deployment page on a project, and search for your block:

Just click Build and now you'll have a freshly built binary from your own deployment block!

5. Conclusion

Custom deployment blocks are a powerful tool for your organization. They let you build binaries for unreleased products, let you package up impulse as custom libraries, or can let your customers deploy to private targets (if you add an external collaborator to a project they'll have access to the blocks as well). Because the deployment blocks are integrated with your project, and hosted by Edge Impulse this lets everyone, from FAE to R&D developer, now iterate on on-device models without getting your embedded engineers involved.

Parameters.json format

This is the specification for the parameters.json type:

Deployment metadata spec

This is the specification for the deployment-metadata.json file from .

Health Reference Design

In this section, you will find a health reference design that describes an end-to-end ML workflow for building a wearable health product using Edge Impulse. It covers an activity study in a clinical lab, where data is recorded from the wearable end device (PPG + accelerometer), a reference device (Polar H10 HR monitor), plus labels (e.g. sitting, running, biking). The data is collected and validated, then written to a clinical dataset in an Edge Impulse organization, and finally imported into an Edge Impulse project where we train a classifier.

It handles data coming from multiple sources, data alignment, and a multi-stage pipeline before the data is imported into an Edge Impulse project. We won't cover in detail all the code snippets, our solution engineers can help you set this end-to-end ML workflow.

With this health reference design section, we want to help you understand how to create a full clinical data pipeline by:

Synchronizing clinical data with a bucket

In this section, we will show how to synchronize research data with a bucket in your organizational dataset. The goal of this step is to gather data from different sources and sort them to obtain a sorted dataset (that we will then validate in the next section).

Only available with Edge Impulse Enterprise Plan

Try our FREE Enterprise Trial today.

The reference design described in the health reference design consists of 10 subjects performing 1.5 - 2 hours of activities in a research lab. Participants have a study ID (e.g. AMS_001) that is used to refer to the participant. For each participant we have 4 CSV files:

accelerometer.csv - data from the wearable end device.
ppg.csv - data from the wearable end device.
polar_h10.csv - reference data from a commercial reference device (Polar H10).
labels.csv - labels of the activity, as recorded by the research lab.

We've mimicked a proper research study, and have split the data up into two locations.

accelerometer.csv / ppg.csv - live in the company data lake in S3. The data lake uses an internal structure with non-human readable IDs for each participant (e.g. 2E93ZX for anonymized data):
```
7HAIGO
|_ accelerometer.csv
|_ ppg.csv
Z0ZPJW
|_ accelerometer.csv
|_ ppg.csv
```
polar_h10.csv / labels.csv are uploaded by the research partner to an upload portal. The files are prefixed with the study ID:

To create the mapping between the study ID and the internal data lake ID we use a study master sheet. It contains information about all participants, ID mapping, and metadata. E.g.:

Subject	    Internal ID	    Study date	    Age	    BMI
AMS_001	    7HAIGO      	2022-03-10	    24	    18
AMS_002	    Z0ZPJW      	2022-01-27	    35	    31

Notes: This master sheet was made using a Google Sheet but can be anything. All data (data lake, portal, output) are hosted in an Edge Impulse S3 bucket but can be stored anywhere (see below).

Configuring a storage bucket for your dataset

Data is stored in storage buckets, which can either be hosted by Edge Impulse, or in your own infrastructure. If you choose to host the data yourself your infrastructure should be available through the S3 API, and you are responsible for setting up proper backups. To configure a new storage bucket, head to your organization, choose Data > Buckets, click Add new bucket, and fill in your access credentials. Our solution engineers are also here to help you set up the buckets for you.

About datasets

With the storage bucket in place you can create your first dataset. Datasets in Edge Impulse have three layers:

The dataset, a larger set of data items, grouped together.
Data item, an item with metadata and files attached.
Data file, the actual files.

No required format for data files

There is no required format for data files. You can upload data in any format, whether it's CSV, Parquet, or a proprietary data format.

Adding research data to your organization

There are three ways of uploading data into your organization. You can either:

Upload data directly to the storage bucket (recommended method). In this case use Add data... > Add dataset from bucket and the data will be discovered automatically.
Upload data through the Edge Impulse API.
Upload the files through the Upload Portals.

Sorter and combiner

Sorter

The sorter is the first step of the research pipeline. It's job is to fetch the data from all locations (here: internal data lake, portal, metadata from study master sheet) and create a research dataset in Edge Impulse. It does this by:

Creating a new structure in S3 like this:

AMS_001
|_ AMS_001_labels.csv
|_ AMS_001_polar_h10.csv
|_ accelerometer.csv
|_ ppg.csv
AMS_002
|_ AMS_002_labels.csv
|_ AMS_002_polar_h10.csv
|_ accelerometer.csv
|_ ppg.csv

Syncing the S3 folder with a research dataset in your Edge Impulse organization (like AMS Activity Study 2022).
Updating the metadata with the metadata from the master sheet (Age, BMI, etc...).

Combiner

With the data sorted we then:

Need to verify that the data is correct (see validate your research data)
Combine the data into a single Parquet file. This is essentially the contract we have for our dataset. By settling on a standard format (strong typed, same column names everywhere) this data is now ready to be used for ML, new algorithm development, etc. Because we also add metadata for each file here we're very quickly building up a valuable R&D datastore.

All these steps can be run through different transformation blocks and executed one after the other using data pipelines.

Validating clinical data

Only available with Edge Impulse Enterprise Plan

Try our FREE Enterprise Trial today.

Using Checklists

You can optionally show a check mark in the list of data items, and show a check list for data items. This can be used to quickly view which data items are complete (if you need to capture data from multiple sources) or whether items are in the right format.

Checklists look trivial, but are actually very powerful as they give quick insights in dataset issues. Missing these issues until after the study is done can be super expensive.

Checklists are written to ei-metadata.json and are automatically being picked up by the UI.

Checklists are driven by the metadata for a data item. Set the ei_check metadata item to either 0 or 1 to show a check mark in the list. Set an ei_check_KEYNAME metadata item to 0 or 1 to show the item in the check list.

To query for items with or without a check mark, use a filter in the form of:

metadata->ei_check = 1

To make it easy to create these lists on the fly you can set these metadata items directly from a transformation block

Example

For the reference design described and used in the previous pages, the combiner takes in a data item, and writes out:

A checklist, e.g.:
- ✔ - PPG file present
- ✔ - Accelerometer file present
- ✘ - Correlation between Polar/PPG HR is at least 0.5
If the checklist is OK, a combined.parquet file.
A hr.png file with the correlation between HR found from PPG, and HR from the reference device. This is useful for two reasons:
- If the correlation is too low we're looking at the wrong file, or data is missing.
- Verify if the PPG => HR algorithm actually works.

Querying clinical data

Organizational datasets contain a powerful query system which lets you explore and slice data. You control the query system through the 'Filter' text box, and you use a language which is very similar to SQL (documentation).

Only available with Edge Impulse Enterprise Plan

Try our FREE Enterprise Trial today.

For example, here are some queries that you can make:

dataset like '%AMS Activity Study%' - returns all items and files from the study.
bucket_name = 'edge-impulse-health-reference-design' AND --labels sitting,walking - returns data whose label is 'sitting' and 'walking, and that is stored in the 'edge-impulse-health-reference-design' bucket.
metadata->ei_check = 0 - return data that have a metadata field 'ei_check' which is '0'.
created > DATE('2022-08-01') - returns all data that was created after Aug 1, 2022.

After you've created a filter, you can select one or more data items, and select Actions...>Download selected to create a ZIP file with the data files. The file count reflects the number of files returned by the filter.

The previous queries all returned all files for a data item. But you can also query files through the same filter. In that case the data item will be returned, but only with the files selected. For example:

file_name LIKE '%.png' - returns all files that end with .png.

If you have an interesting query that you'd like to share with your colleagues, you can just share the URL. The query is already added to it automatically.

All available fields

These are all the available fields in the query interface:

dataset - Dataset.
bucket_id - Bucket ID.
bucket_name - Bucket name.
bucket_path - Path of the data item within the bucket.
id - Data item ID.
name - Data item name.
total_file_count - Number of files for the data item.
total_file_size - Total size of all files for the data item.
created - When the data item was created.
metadata->key - Any item listed under 'metadata'.
file_name - Name of a file.
file_names - All filenames in the data item, that you can use in conjunction with CONTAINS. E.g. find all items with file X, but not file Y: file_names CONTAINS 'x' AND not file_names CONTAINS 'y'.

Transforming clinical data

Transformation blocks take raw data from your organizational datasets and convert the data into a different dataset or files that can be loaded in an Edge Impulse project. You can use transformation blocks to only include certain parts of individual data files, calculate long-running features like a running mean or derivatives, or efficiently generate features with different window lengths. Transformation blocks can be written in any language, and run on the Edge Impulse infrastructure.

In this tutorial we build a Python-based transformation block that loads Parquet files, calculates features from the Parquet file, and then writes a new file back to your dataset. If you haven't done so, go through synchronizing clinical data with a bucket first.

Only available with Edge Impulse Enterprise Plan

Try our FREE Enterprise Trial today.

1. Prerequisites

You'll need:

The Edge Impulse CLI.
- If you receive any warnings that's fine. Run edge-impulse-blocks afterwards to verify that the CLI was installed correctly.
The gestures.parquet file which you can use to test the transformation block. This contains some data from the Continuous gestures dataset in Parquet format.

Transformation blocks use Docker containers, a virtualization technique that lets developers package up an application with all dependencies in a single package. If you want to test your blocks locally you'll also need (this is not a requirement):

Docker desktop installed on your machine.

1.1 - Parquet schema

This is the Parquet schema for the gestures.parquet file which we'll transform:

message root {
  required binary sampleName (UTF8);
  required int64 timestamp (TIMESTAMP_MILLIS);
  required int64 added (TIMESTAMP_MILLIS);
  required boolean signatureValid;
  required binary device (UTF8);
  required binary label (UTF8);
  required float accX;
  required float accY;
  required float accZ;
}

2. Building your first transformation block

To build a transformation block open a command prompt or terminal window, create a new folder, and run:

$ edge-impulse-blocks init

This will prompt you to log in, and enter the details for your block. E.g.:

Edge Impulse Blocks v1.9.0
? What is your user name or e-mail address (edgeimpulse.com)? jan+demo@edgeimpulse.com
? What is your password? [hidden]
Attaching block to organization 'Demo org Inc.'
? Choose a type of block Transformation block
? Choose an option Create a new block
? Enter the name of your block Demo dataset transformation
? Enter the description of your block Reads a Parquet file, extracts features, and writes the block back to the dataset
Creating block with config: {
  name: 'Demo dataset transformation',
  type: 'transform',
  description: 'Reads a Parquet file and splits it up in labeled data',
  organizationId: 34
}
Your new block 'Demo dataset transformation' has been created in '~/repos/tutorial-processing-block'.
When you have finished building your transformation block, run "edge-impulse-blocks push" to update the block in Edge Impulse.

Then, create the following files in this directory:

2.1 - Dockerfile

We're building a Python based transformation block. The Dockerfile describes our base image (Python 3.7.5), our dependencies (in requirements.txt) and which script to run (transform.py).

FROM python:3.7.5-stretch

WORKDIR /app

# Python dependencies
COPY requirements.txt ./
RUN pip3 --no-cache-dir install -r requirements.txt

COPY . ./

ENTRYPOINT [ "python3",  "transform.py" ]

Note: Do not use a WORKDIR under /home! The /home path will be mounted in by Edge Impulse, making your files inaccessible.

ENTRYPOINT vs RUN / CMD

If you use a different programming language, make sure to use ENTRYPOINT to specify the application to execute, rather than RUN or CMD.

2.2 - requirements.txt

This file describes the dependencies for the block. We'll be using pandas and pyarrow to parse the Parquet file, and numpy to do some calculations.

numpy==1.16.4
pandas==0.23.4
pyarrow==0.16.0

2.3 - transform.py

This file includes the actual application. Transformation blocks are invoked with three parameters (as command line arguments):

--in-file or --in-directory - A file (if the block operates on a file), or a directory (if the block operates on a data item) from the organizational dataset. In this case the gestures.parquet file.
--out-directory - Directory to write files to.
--hmac-key - You can use this HMAC key to sign the output files. This is not used in this tutorial.
--metadata - Key/value pairs containing the metadata for the data item, plus additional metadata about the data item in the dataItemInfo key. E.g.: { "subject": "AAA001", "ei_check": "1", "dataItemInfo": { "id": 101, "dataset": "Human Activity 2022", "bucketName": "edge-impulse-tutorial", "bucketPath": "janjongboom/human_activity/AAA001/", "created": "2022-03-07T09:20:59.772Z", "totalFileCount": 14, "totalFileSize": 6347421 } }

Add the following content. This takes in the Parquet file, groups data by their label, and then calculates the RMS over the X, Y and Z axes of the accelerometer.

import pyarrow.parquet as pq
import numpy as np
import math, os, sys, argparse, json, hmac, hashlib, time
import pandas as pd

# these are the three arguments that we get in
parser = argparse.ArgumentParser(description='Organization transformation block')
parser.add_argument('--in-file', type=str, required=True)
parser.add_argument('--out-directory', type=str, required=True)

args, unknown = parser.parse_known_args()

# verify that the input file exists and create the output directory if needed
if not os.path.exists(args.in_file):
    print('--in-file argument', args.in_file, 'does not exist', flush=True)
    exit(1)

if not os.path.exists(args.out_directory):
    os.makedirs(args.out_directory)

# load and parse the input file
print('Loading parquet file', args.in_file, flush=True)
table = pq.read_table(args.in_file)
data = table.to_pandas()

features = []

# we group by label and then extract some metrics
for label in data.label.unique():
    data_per_label = data[data.label == label]

    # calculate the RMS per axis
    features.append({
        'label': label,
        'rmsX': np.sqrt(np.mean(data_per_label.accX**2)),
        'rmsY': np.sqrt(np.mean(data_per_label.accY**2)),
        'rmsZ': np.sqrt(np.mean(data_per_label.accZ**2))
    })

# and store as new file in the output directory
out_file = os.path.join(args.out_directory, os.path.splitext(os.path.basename(args.in_file))[0] + '_features.parquet')
pd.DataFrame(features).to_parquet(out_file)

print('Written features file', out_file, flush=True)

2.4 - Building and testing the container

On your local machine

To test the transformation block locally, if you have Python and all dependencies installed, just run:

$ python3 transform.py --in-file gestures.parquet --out-directory out/

Docker

You can also build the container locally via Docker, and test the block. The added benefit is that you don't need any dependencies installed on your local computer, and can thus test that you've included everything that's needed for the block. This requires Docker desktop to be installed.

To build the container and test the block, open a command prompt or terminal window and navigate to the source directory. First, build the container:

$ docker build -t test-org-transform-parquet-dataset .

Then, run the container (make sure gestures.parquet is in the same directory):

$ docker run --rm -v $PWD:/data test-org-transform-parquet-dataset --in-file /data/gestures.parquet --out-directory /data/out

Seeing the output

This process has generated a new Parquet file in the out/ directory containing the RMS of the X, Y and Z axes. If you inspect the content of the file (e.g. using parquet-tools) you'll see the output:

$ parquet-tools head -n5 out/gestures_features.parquet 
label = wave
rmsX = 11.424144744873047
rmsY = 4.73303747177124
rmsZ = 2.944265842437744

label = updown
rmsX = 3.899503231048584
rmsY = 3.9587674140930176
rmsZ = 10.34404468536377

label = circle
rmsX = 6.263721942901611
rmsY = 7.0987162590026855
rmsZ = 6.159618854522705

label = idle
rmsX = 3.714001178741455
rmsY = 3.4940428733825684
rmsZ = 8.6710205078125

label = snake
rmsX = 1.282995581626892
rmsY = 1.8830623626708984
rmsZ = 9.597149848937988

Success!

3. Pushing the transformation block to Edge Impulse

With the block ready we can push it to your organization. Open a command prompt or terminal window, navigate to the folder you created earlier, and run:

$ edge-impulse-blocks push

This packages up your folder, sends it to Edge Impulse where it'll be built, and finally is added to your organization.

Edge Impulse Blocks v1.9.0
Archiving 'tutorial-processing-block'...
Archiving 'tutorial-processing-block' OK (2 KB) /var/folders/3r/fds0qzv914ng4t17nhh5xs5c0000gn/T/ei-transform-block-7812190951a6038c2f442ca02d428c59.tar.gz

Uploading block 'Demo dataset transformation' to organization 'Demo org Inc.'...
Uploading block 'Demo dataset transformation' to organization 'Demo org Inc.' OK

Building transformation block 'Demo dataset transformation'...
Job started
...
Building transformation block 'Demo dataset transformation' OK

Your block has been updated, go to https://studio.edgeimpulse.com/organization/34/data to run a new transformation

The transformation block is now available in Edge Impulse under Data transformation > Transformation blocks.

If you make any changes to the block, just re-run edge-impulse-blocks push and the block will be updated.

4. Uploading gestures.parquet to Edge Impulse

Next, upload the gestures.parquet file, by going to Data > Add data... > Add data item, setting name as 'Gestures', dataset to 'Transform tutorial', and selecting the Parquet file.

This makes the gestures.parquet file available from the Data page.

5. Starting the transformation

With the Parquet file in Edge Impulse and the transformation block configured you can now create a new job. Go to Data, and select the Parquet file by setting the filter to dataset = 'Transform tutorial'.

Click the checkbox next to the data item, and select Transform selected (1 file). On the 'Create transformation job' page select 'Import data into Dataset'. Under 'output dataset', select 'Same dataset as source', and under 'Transformation block' select the new transformation block.

Click Start transformation job to start the job. This pulls the data in, starts a transformation job and finally uploads the data back to your dataset. If you have multiple files selected the transformations will also run in parallel.

You can now find the transformed file back in your dataset:

6. Next steps

Transformation blocks are a powerful feature which let you set up a data pipeline to turn raw data into actionable machine learning features. It also gives you a reproducible way of transforming many files at once, and is programmable through the Edge Impulse API so you can automatically convert new incoming data. If you're interested in transformation blocks or any of the other enterprise features, let us know!

Appendix: Advanced features

Updating metadata from a transformation block

{
    "version": 1,
    "action": "add",
    "metadata": {
        "some-key": "some-value"
    }
}

Some notes:

If action is set to add the metadata keys are added to the data item. If action is set to replace all existing metadata keys are removed.

Environmental variables

EI_API_KEY - an API key with 'member' privileges for the organization.
EI_ORGANIZATION_ID - the organization ID that the block runs in.
EI_API_ENDPOINT - the API endpoint (default: https://studio.edgeimpulse.com/v1).

Custom parameters

You can specify custom arguments or parameters to your block by adding a parameters.json file in the root of your block directory. This file describes all arguments for your training pipeline, and is used to render custom UI elements for each parameter. For example, this parameters file:

[{
    "name": "Bucket",
    "type": "bucket",
    "param": "bucket-name",
    "value": "",
    "help": "The bucket where you're hosting all data"
},
{
    "name": "Bucket prefix",
    "value": "my-test-prefix/",
    "type": "string",
    "param": "bucket-prefix",
    "help": "The prefix in the bucket, where you're hosting the data"
}]

Renders the following UI when you run the transformation block:

And the options are passed in as command line arguments to your block:

--bucket-name "ei-data-dev" --bucket-prefix "my-test-prefix/"

For more information, and all options see Adding parameters to custom blocks.

Buildling data pipelines

Building data pipelines is a very useful feature where you can stack several transformation blocks similar to the Data sources pipelines. They can be used in a standalone mode (just execute several transformation jobs in a pipeline), to feed a dataset or to feed a project.

Only available with Edge Impulse Professional and Enterprise Plans

Try our Professional Plan or FREE Enterprise Trial today.

The examples in the screenshots below shows how to create and use a pipeline to create the 'AMS Activity 2022' dataset.

Create a pipeline

To create a new pipeline, click on '+Add a new pipeline:

Get the steps from your transformation blocks

In your organization workspace, go to Custom blocks -> Transformation and select Run job on the job you want to add.

Select Copy as pipeline step and paste it to the configuration json file.

You can then paste the copied step directly to the respected field.

Below, you have an option to feed the data to either a organisation dataset or an Edge Impulse project

Schedule and notify

By default, your pipeline will run every day. To schedule your pipeline jobs, click on the ⋮ button and select Edit pipeline.

Once the pipeline has successfully finished, it can send an email to the Users to notify.

Run the pipeline

Once your pipeline is set, you can run it directly from the UI, from external sources or by scheduling the task.

Run the pipeline from the UI

To run your pipeline from Edge Impulse studio, click on the ⋮ button and select Run pipeline now.

Run the pipeline from code

To run your pipeline from Edge Impulse studio, click on the ⋮ button and select Run pipeline from code. This will display an overlay with curl, Node.js and Python code samples.

You will need to create an API key to run the pipeline from code.

Webhooks

Another useful feature is to create a webhook to call a URL when the pipeline has ran. It will run a POST request containing the following information:

{
    "organizationId":XX,
    "pipelineId":XX,
    "pipelineName":"Fetch, sort, validate and combine",
    "projectId":XXXXX,
    "success":true,
    "newItems":0,
    "newChecklistOK":0,
    "newChecklistFail":0
}

Transformation blocks

Transformation blocks are very flexible and can be used for most advanced use cases.

Transformation blocks are available in your organization pipelines and in your project pipelines so you can automate your processes.

Transformation blocks can be written in any language, and run on Edge Impulse infrastructure.

Only available with Edge Impulse Enterprise Plan

Try our FREE Enterprise Trial today.

Run transformation blocks

You can run your transformation blocks as transformation jobs. They can be triggered:

from your organization:

From this view, Custom blocks->Transformation
From the Data transformation
From the Data pipelines

from your projects:

From the Data sources (Standalone transformation blocks only)
From Synthetic data (Synthetic data blocks only)

Public blocks

By default, we provide several pre-built transformation blocks that you can use directly in your organization or your organization's projects.

We will add more over time when we see a recurring need or interest. The current ones are the following:

Understanding the transformation blocks

A transformation block consists of a Docker image that contains one or several scripts. The Docker image is encapsulated in the transformation block with additional parameters.

Here is a minimal configuration for the transformation blocks:

In this documentation page, we will explain how to setup a transformation block and will explain the different options.

Import existing transformation blocks

You can directly create your transformation block within Edge Impulse Studio from a public Docker image or import existing transformation blocks:

Example repository

To run the data transformation jobs, see the Data transformation documentation page.

Setting up transformation blocks

To setup your block, an easy method is to use the Edge Impulse CLI command, edge-impulse-blocks init:

$> edge-impulse-blocks init 

Edge Impulse Blocks v1.21.1
? In which organization do you want to create this block? 
❯ Developer Relations 
  Medical Laboratories inc.
  Demo Team 
Attaching block to organization 'Developer Relations'
? Choose a type of block
❯ Transformation block 
  Deployment block 
  DSP block 
  Machine learning block
? Choose an option: 
❯ Create a new block 
  Update an existing block 
? Enter the name of your block: Generate helper graphs from sensor CSV
? Enter the description of your block: Transformation block to help you visualize what how your sensor time series data look like by creating a graph from the CSV files
? What type of data does this block operate on? 
  File (--in-file passed into the block) 
  Data item (--in-directory passed into the block) 
❯ Standalone (runs the container, but no files / data items passed in)
? Which buckets do you want to mount into this block (will be mounted under /mnt/s3fs/BUCKET_NAME, you can change these mount points in the Studio)?
(Press <space> to select, <a> to toggle all, <i> to invert selection)
❯ ◉ edge-impulse-devrel-team
  ◯ ei-datasets
❯ yes 
  no

Tip: If you want to access your bucket, make sure to press <space> to select the bucket attached to your organization.

The step above will create the following .parameters.json file in your project directory:

{
    "version": 1,
    "type": "transform",
    "info": {
        "name": "Generate helper graphs from sensor CSV",
        "description": "Transformation block to help you visualize what how your sensor time series data look like by creating a graph from the CSV files",
        "operatesOn": "standalone",
        "transformMountpoints": [
            {
                "bucketId": 3096,
                "mountPoint": "/mnt/s3fs/edge-impulse-devrel-team"
            }
        ]
    },
    "parameters": []
}

To push your transformation block, simply run edge-impulse-blocks push.

Dockerfile

At Edge Impulse, we mostly use Python, Javascript/Typescript and Bash scripts, but you can write your transformation blocks in any language.

Dockerfile example to trigger a Bash script:

FROM ubuntu:20.04

WORKDIR /app

# Copy the bash script into the container
COPY hello.sh /hello.sh

# Make the bash script executable
RUN chmod +x /hello.sh

# Set the entrypoint command to run the script with the provided --name argument
ENTRYPOINT ["/hello.sh"]

Dockerfile example to trigger a Python script and install the required dependencies:

FROM python:3.7.5-stretch

WORKDIR /app

# Python dependencies
COPY requirements.txt ./
RUN pip3 --no-cache-dir install -r requirements.txt

COPY . ./

ENTRYPOINT [ "python3",  "transform.py" ]

The Dockerfile above describes a base image (Python 3.7.5), the Python dependencies (in requirements.txt) and which script to run (transform.py).

Note: Do not use a WORKDIR under /home! The /home path will be mounted in by Edge Impulse, making your files inaccessible.

ENTRYPOINT vs RUN / CMD

If you create a custom Dockerfile, make sure to use ENTRYPOINT to specify the application to execute, rather than RUN or CMD.

If you want to host your docker image on an external registry, you can use Docker Hub and use the username/image:tag in the Docker container field.

Operation modes

We provide three modes to access your data:

In the Standalone mode, no data is passed to the container, but you can still access data by mounting your bucket onto the container.
At the Data item level, we pass the --in-directory and --out-directory arguments. The transformation jobs will run on each directory present in your selected path. These jobs can run in parallel.
At the file level, we pass the --in-file and --out-directory arguments. The transformation jobs will run on each file present in your selected path. These jobs can run in parallel.

Note that for the two last operation modes, you can use query filters to only include certain data items and certain files.

Standalone

Please note that this mode does not support running jobs in parallel, as it is unknown in advance how many files or how many directories are present in your dataset.

You can use custom blocks parameters to retrieve the bucket name and the required directory to access your files programmatically.

Examples

Python script to create graphs from your CSV sensor Data

e.g. in Python, a script to create graphs from CSV sensor data:

import os, sys, argparse
import pandas as pd
import matplotlib.pyplot as plt

# Set the arguments
parser = argparse.ArgumentParser(description='Organization transformation block')
parser.add_argument('--sensor_name', type=str, required=True, help="Sensor data to extract to create the graph")
parser.add_argument('--bucket_name', type=str, required=False, help="Bucket where your dataset is hosted")
parser.add_argument('--bucket_directory', type=str, required=False, help="Directory in your bucket where your dataset is hosted")

args, unknown = parser.parse_known_args()

sensor_name = args.sensor_name
sensor_file = sensor_name + '.csv'

bucket_name = args.bucket_name
bucket_prefix = args.bucket_directory
mount_prefix = os.getenv('MOUNT_PREFIX', '/mnt/s3fs/')

folder = os.path.join(mount_prefix, bucket_name, bucket_prefix) if bucket_prefix else os.path.join(mount_prefix, bucket_name)

# Check if folder exists
if os.path.exists(folder):
    print('path exist', folder)
    for dirpath, dirnames, filenames in os.walk(folder):
        print("dirpath:",dirpath)
        if os.path.exists(os.path.join(dirpath, sensor_file)):
            print("File exist: ", os.path.join(dirpath, sensor_file))
            
            df = pd.read_csv(os.path.join(dirpath, sensor_file))
            df.index = pd.to_datetime(df['time'], unit='ns')

            # Get a list of all columns except 'time' and 'seconds_elapsed'
            columns_to_plot = [col for col in df.columns if col not in ['time', 'seconds_elapsed']]

            # Create subplots for each selected column
            fig, axes = plt.subplots(nrows=len(columns_to_plot), ncols=1, figsize=(12, 6 * len(columns_to_plot)))

            for i, col in enumerate(columns_to_plot):
                axes[i].plot(df.index, df[col])
                axes[i].set_title(f'{col} over time')
                axes[i].set_xlabel('time')
                axes[i].set_ylabel(col)
                axes[i].grid(True)

            # Save the figure with all subplots in the same directory
            plt.tight_layout()
            print("Graph created")
            plt.savefig(os.path.join(dirpath, sensor_name))

            # Display the plots (optional)
            # plt.show()
        
        else:
            print("file is missing in directory ", dirpath)
     
else:
    print('Path does not exist')
    sys.exit(1)


print("Finished")
exit(0)

Bash script to print "Hello +name" in the log console

e.g. a bash script to print "Hello +name", the name being passed as an argument in the transformation block using the custom block parameters:

#!/bin/bash

while [[ $# -gt 0 ]]; do
  key="$1"

  case $key in
    --name)
      NAME="$2"
      shift # past argument
      shift # past value
      ;;
    *)
      # Unknown option
      echo "Unknown option: $1"
      exit 1
      ;;
  esac
done

echo "Hello $NAME"

Data item (`--in-directory`)

When selecting the Data item operation mode, two parameters will be passed to the container:

--in-directory
--out-directory

The transformation jobs will run on each "Data item" (directory) present in your selected path or dataset.

Example

For example, let's consider a clinical dataset like the following, each data item has several files:

Setup the transformation job in the Studio:

You will be able to see logs for each data items looking like the following:

--in-directory:  /data/edge-impulse-devrel-team/datasets/activity-detection/Cycling-2023-09-14_06-47-00/
--out-directory:  /home/transform/26773541
--in-file:  None
in-directory has ['Accelerometer.csv', 'Accelerometer.png', 'Annotation.csv', 'Gravity.csv', 'Gyroscope.csv', 'Location.csv', 'LocationGps.csv', 'LocationNetwork.csv', 'Magnetometer.csv', 'Metadata.csv', 'Orientation.csv', 'Pedometer.csv', 'TotalAcceleration.csv']
out-directory has []
Copying file from  /data/edge-impulse-devrel-team/datasets/activity-detection/Cycling-2023-09-14_06-47-00/ to  /home/transform/26773541
out-directory has now ['Accelerometer.csv']

The copied file will be placed in a temporary directory and then be copied to the desired output dataset respecting the folder structure.

File (`--in-file`)

When selecting the File operation mode, two parameters will be passed to the container:

--in-file
--out-directory

The transformation jobs will run on each file present in selected path.

Example

If we use the same transformation block as above. First, make sure to set the operation mode to File by editing your transformation block:

Then set the transformation job.

We have used the following filter: dataset = 'Activity Detection (Clinical view)' and file_name like '%Gyro%'

Run the jobs and the logs for one file should look like the following:

--in-directory:  None
--out-directory:  /home/transform/26808538
--in-file:  /data/edge-impulse-devrel-team/datasets/activity-detection/Sitting-2023-09-14_09-11-15/Gyroscope.csv
out-directory has []
--in-file path /data/edge-impulse-devrel-team/datasets/activity-detection/Sitting-2023-09-14_09-11-15/Gyroscope.csv exist
coping Gyroscope.csv to /home/transform/26773541
out-directory has now ['Gyroscope.csv']

Compute requests & limits

When editing your block on Edge Impulse Studio, you can set the number of desired CPUs and the memory needed for your container to run properly. Likely, you can set the limits of the same parameters.

Metadata (Data item and file operation modes)

{
    "version": 1,
    "action": "add",
    "metadata": {
        "some-key": "some-value"
    }
}

Some notes:

If action is set to add the metadata keys are added to the data item. If action is set to replace all existing metadata keys are removed.

Mounting points

When using the CLI to setup your block, by default we mount your bucket with the following mounting point:

/mnt/s3fs/your-bucket

You can change this value if you want your transformation block to behave differently.

Custom parameters

See adding parameters to custom blocks dedicated documentation page.

Environmental variables

EI_API_KEY - an API key with 'member' privileges for the organization.
EI_ORGANIZATION_ID - the organization ID that the block runs in.
EI_API_ENDPOINT - the API endpoint (default: https://studio.edgeimpulse.com/v1).

Examples & resources

Standalone

Label image data using GPT-4o: Label image data using GPT-4o block
Dall-E Image Generation (Python): Tutorial / GitHub
Text to speech transform block (Javascript): GitHub
Fetch a dataset hosted on Kaggle (Python): Github
Generate graph from sensor csv data (Python): Github
Hello Edge (Bash): Github

File (`--in-file`)

Mix background noise into audio files (Bash script): GitHub
Access your data - Helper transformation block (Python): Github
Resample CSV (Python): Github

Data Item (`--in-directory`)

Access your data - Helper transformation block (Python): Github
Check file existence - Add ei_check metadata on file existence (Python): Github
Merge CSV files - Merge CSV files on a given key (Python): Github
Merge audio and CSV - Merge audio file and time-series CSV (Python): Github

Recap

Now that you have a better idea of what are transformation blocks, here is a graphical recap of how it works:

Parameters.json format

Transformation blocks

This is the specification for the parameters.json file:

type TransformBlockParametersJson = {
    version: 1,
    type: 'transform',
    info: {
        name: string,
        description: string,
        operatesOn: 'file' | 'directory' | 'standalone' | undefined;
        transformMountpoints: {
            bucketId: number;
            mountPoint: string;
        }[] | undefined;
        indMetadata: boolean | undefined;
        cliArguments: string | undefined;
        allowExtraCliArguments: boolean | undefined;
        showInDataSources: boolean | undefined;
        showInCreateTransformationJob: boolean | undefined;
    },
    // see spec in https://docs.edgeimpulse.com/docs/tips-and-tricks/adding-parameters-to-custom-blocks
    parameters: DSPParameterItem[];
};

Synthetic data blocks

This is the specification for the parameters.json file:

type SyntheticDataBlockParametersJson = {
    version: 1,
    type: 'synthetic-data',
    info: {
        name: string,
        description: string,
    },
    // see spec in https://docs.edgeimpulse.com/docs/tips-and-tricks/adding-parameters-to-custom-blocks
    parameters: DSPParameterItem[];
};

Troubleshooting

The job run indefinitely

Cannot access files in bucket

If you cannot access your files in your bucket, make sure that the mount point is properly configured.

When using the CLI, it is a common mistake to forget pressing <space> key to select the bucket attached to your organization.

Job failed without logs (only Job failed)

It probably means that we had an issue when triggering the container. In many cases it is related with the issue above, the mount point not being properly configured.

I cannot access the logs

We are still investigating why all the logs are not displayed properly. If you are using Python, you can also flush stdout after you print it using something like print("hello", flush=True).

Can I host my Docker image on Docker Hub?

Yes, you can. You can test this Standalone transformation block if you'd like: luisomoreau/hello_edge:latest

Also, make sure to configure the additional block parameters with this config:

[
    {
        "name": "Name",
        "type": "string",
        "param": "name",
        "value": "",
        "help": "Person to greet"
    }
]

It will print "hello +name" on the transformation job logs.

Transforming clinical data

Only available with Edge Impulse Enterprise Plan

Try our FREE Enterprise Trial today.

1. Prerequisites

You'll need:

The Edge Impulse CLI.
- If you receive any warnings that's fine. Run edge-impulse-blocks afterwards to verify that the CLI was installed correctly.
The gestures.parquet file which you can use to test the transformation block. This contains some data from the Continuous gestures dataset in Parquet format.

Docker desktop installed on your machine.

1.1 - Parquet schema

This is the Parquet schema for the gestures.parquet file which we'll transform:

message root {
  required binary sampleName (UTF8);
  required int64 timestamp (TIMESTAMP_MILLIS);
  required int64 added (TIMESTAMP_MILLIS);
  required boolean signatureValid;
  required binary device (UTF8);
  required binary label (UTF8);
  required float accX;
  required float accY;
  required float accZ;
}

2. Building your first transformation block

To build a transformation block open a command prompt or terminal window, create a new folder, and run:

$ edge-impulse-blocks init

This will prompt you to log in, and enter the details for your block. E.g.:

Edge Impulse Blocks v1.9.0
? What is your user name or e-mail address (edgeimpulse.com)? jan+demo@edgeimpulse.com
? What is your password? [hidden]
Attaching block to organization 'Demo org Inc.'
? Choose a type of block Transformation block
? Choose an option Create a new block
? Enter the name of your block Demo dataset transformation
? Enter the description of your block Reads a Parquet file, extracts features, and writes the block back to the dataset
Creating block with config: {
  name: 'Demo dataset transformation',
  type: 'transform',
  description: 'Reads a Parquet file and splits it up in labeled data',
  organizationId: 34
}
Your new block 'Demo dataset transformation' has been created in '~/repos/tutorial-processing-block'.
When you have finished building your transformation block, run "edge-impulse-blocks push" to update the block in Edge Impulse.

Then, create the following files in this directory:

2.1 - Dockerfile

We're building a Python based transformation block. The Dockerfile describes our base image (Python 3.7.5), our dependencies (in requirements.txt) and which script to run (transform.py).

FROM python:3.7.5-stretch

WORKDIR /app

# Python dependencies
COPY requirements.txt ./
RUN pip3 --no-cache-dir install -r requirements.txt

COPY . ./

ENTRYPOINT [ "python3",  "transform.py" ]

Note: Do not use a WORKDIR under /home! The /home path will be mounted in by Edge Impulse, making your files inaccessible.

ENTRYPOINT vs RUN / CMD

If you use a different programming language, make sure to use ENTRYPOINT to specify the application to execute, rather than RUN or CMD.

2.2 - requirements.txt

This file describes the dependencies for the block. We'll be using pandas and pyarrow to parse the Parquet file, and numpy to do some calculations.

numpy==1.16.4
pandas==0.23.4
pyarrow==0.16.0

2.3 - transform.py

This file includes the actual application. Transformation blocks are invoked with three parameters (as command line arguments):

--in-file or --in-directory - A file (if the block operates on a file), or a directory (if the block operates on a data item) from the organizational dataset. In this case the gestures.parquet file.
--out-directory - Directory to write files to.
--hmac-key - You can use this HMAC key to sign the output files. This is not used in this tutorial.
--metadata - Key/value pairs containing the metadata for the data item, plus additional metadata about the data item in the dataItemInfo key. E.g.: { "subject": "AAA001", "ei_check": "1", "dataItemInfo": { "id": 101, "dataset": "Human Activity 2022", "bucketName": "edge-impulse-tutorial", "bucketPath": "janjongboom/human_activity/AAA001/", "created": "2022-03-07T09:20:59.772Z", "totalFileCount": 14, "totalFileSize": 6347421 } }

Add the following content. This takes in the Parquet file, groups data by their label, and then calculates the RMS over the X, Y and Z axes of the accelerometer.

import pyarrow.parquet as pq
import numpy as np
import math, os, sys, argparse, json, hmac, hashlib, time
import pandas as pd

# these are the three arguments that we get in
parser = argparse.ArgumentParser(description='Organization transformation block')
parser.add_argument('--in-file', type=str, required=True)
parser.add_argument('--out-directory', type=str, required=True)

args, unknown = parser.parse_known_args()

# verify that the input file exists and create the output directory if needed
if not os.path.exists(args.in_file):
    print('--in-file argument', args.in_file, 'does not exist', flush=True)
    exit(1)

if not os.path.exists(args.out_directory):
    os.makedirs(args.out_directory)

# load and parse the input file
print('Loading parquet file', args.in_file, flush=True)
table = pq.read_table(args.in_file)
data = table.to_pandas()

features = []

# we group by label and then extract some metrics
for label in data.label.unique():
    data_per_label = data[data.label == label]

    # calculate the RMS per axis
    features.append({
        'label': label,
        'rmsX': np.sqrt(np.mean(data_per_label.accX**2)),
        'rmsY': np.sqrt(np.mean(data_per_label.accY**2)),
        'rmsZ': np.sqrt(np.mean(data_per_label.accZ**2))
    })

# and store as new file in the output directory
out_file = os.path.join(args.out_directory, os.path.splitext(os.path.basename(args.in_file))[0] + '_features.parquet')
pd.DataFrame(features).to_parquet(out_file)

print('Written features file', out_file, flush=True)

2.4 - Building and testing the container

On your local machine

To test the transformation block locally, if you have Python and all dependencies installed, just run:

$ python3 transform.py --in-file gestures.parquet --out-directory out/

Docker

To build the container and test the block, open a command prompt or terminal window and navigate to the source directory. First, build the container:

$ docker build -t test-org-transform-parquet-dataset .

Then, run the container (make sure gestures.parquet is in the same directory):

$ docker run --rm -v $PWD:/data test-org-transform-parquet-dataset --in-file /data/gestures.parquet --out-directory /data/out

Seeing the output

$ parquet-tools head -n5 out/gestures_features.parquet 
label = wave
rmsX = 11.424144744873047
rmsY = 4.73303747177124
rmsZ = 2.944265842437744

label = updown
rmsX = 3.899503231048584
rmsY = 3.9587674140930176
rmsZ = 10.34404468536377

label = circle
rmsX = 6.263721942901611
rmsY = 7.0987162590026855
rmsZ = 6.159618854522705

label = idle
rmsX = 3.714001178741455
rmsY = 3.4940428733825684
rmsZ = 8.6710205078125

label = snake
rmsX = 1.282995581626892
rmsY = 1.8830623626708984
rmsZ = 9.597149848937988

Success!

3. Pushing the transformation block to Edge Impulse

With the block ready we can push it to your organization. Open a command prompt or terminal window, navigate to the folder you created earlier, and run:

$ edge-impulse-blocks push

This packages up your folder, sends it to Edge Impulse where it'll be built, and finally is added to your organization.

Edge Impulse Blocks v1.9.0
Archiving 'tutorial-processing-block'...
Archiving 'tutorial-processing-block' OK (2 KB) /var/folders/3r/fds0qzv914ng4t17nhh5xs5c0000gn/T/ei-transform-block-7812190951a6038c2f442ca02d428c59.tar.gz

Uploading block 'Demo dataset transformation' to organization 'Demo org Inc.'...
Uploading block 'Demo dataset transformation' to organization 'Demo org Inc.' OK

Building transformation block 'Demo dataset transformation'...
Job started
...
Building transformation block 'Demo dataset transformation' OK

Your block has been updated, go to https://studio.edgeimpulse.com/organization/34/data to run a new transformation

The transformation block is now available in Edge Impulse under Data transformation > Transformation blocks.

If you make any changes to the block, just re-run edge-impulse-blocks push and the block will be updated.

4. Uploading gestures.parquet to Edge Impulse

Next, upload the gestures.parquet file, by going to Data > Add data... > Add data item, setting name as 'Gestures', dataset to 'Transform tutorial', and selecting the Parquet file.

This makes the gestures.parquet file available from the Data page.

5. Starting the transformation

You can now find the transformed file back in your dataset:

6. Next steps

Appendix: Advanced features

Updating metadata from a transformation block

{
    "version": 1,
    "action": "add",
    "metadata": {
        "some-key": "some-value"
    }
}

Some notes:

If action is set to add the metadata keys are added to the data item. If action is set to replace all existing metadata keys are removed.

Environmental variables

EI_API_KEY - an API key with 'member' privileges for the organization.
EI_ORGANIZATION_ID - the organization ID that the block runs in.
EI_API_ENDPOINT - the API endpoint (default: https://studio.edgeimpulse.com/v1).

Custom parameters

[{
    "name": "Bucket",
    "type": "bucket",
    "param": "bucket-name",
    "value": "",
    "help": "The bucket where you're hosting all data"
},
{
    "name": "Bucket prefix",
    "value": "my-test-prefix/",
    "type": "string",
    "param": "bucket-prefix",
    "help": "The prefix in the bucket, where you're hosting the data"
}]

Renders the following UI when you run the transformation block:

And the options are passed in as command line arguments to your block:

--bucket-name "ei-data-dev" --bucket-prefix "my-test-prefix/"

For more information, and all options see Adding parameters to custom blocks.

Organization hub

Usage metrics

Users

Organization Users vs Project Users

Organization User Roles

Admin

Member

Guest

Data campaigns

Setting up your dashboard

Setting up your campaign

Data

Buckets

Datasets

Create a new dataset

Data

Default view

Clinical view

Adding data to your project

Previewing Data

Recap

Troubleshooting

CORS Headers

Data transformation

Overview

Transformation jobs

Create a transformation job

Run a transformation job

Default vs. Clinical datasets

Upload portals

1. Configuring a storage bucket

2. Creating an upload portal

3. Uploading data to the portal

4. Using your portal in transformation blocks / clinical data

5. Adding the data to your project

6. Recap

Appendix A: Programmatic access to portals

Custom blocks

Transformation blocks

Run transformation blocks

Public blocks

Understanding the transformation blocks

Import existing transformation blocks

Setting up transformation blocks

Dockerfile

Operation modes

Standalone

Data item (--in-directory)

File (--in-file)

Compute requests & limits

Metadata (Data item and file operation modes)

Mounting points

Custom parameters

Environmental variables

Examples & resources

Standalone

File (--in-file)

Data Item (--in-directory)

Recap

Parameters.json format

Transformation blocks

Synthetic data blocks

Troubleshooting

Deployment blocks

Prerequisites

1. Download example repository

2. Input to your custom deployment block

2.1 Testing the build script with Docker

3. Uploading the deployment block to Edge Impulse

Privileged mode

4. Using the deployment block

5. Conclusion

Parameters.json format

Deployment metadata spec

Health Reference Design

Synchronizing clinical data with a bucket

Configuring a storage bucket for your dataset

About datasets

Adding research data to your organization

Sorter and combiner

Data item (`--in-directory`)

File (`--in-file`)

File (`--in-file`)

Data Item (`--in-directory`)

Data item (`--in-directory`)

File (`--in-file`)

File (`--in-file`)

Data Item (`--in-directory`)