1 of 10

Organizations

Your Edge Impulse organization helps your team with the full lifecycle of your TinyML deployment. It contains tools to collect and maintain large datasets, allows your data scientists to quickly access relevant data through their familiar tools, adds versioning and traceability to your machine learning models, and lets you quickly create new Edge Impulse projects for on-device deployment.

Only available for enterprise customers

Organizational features are only available for enterprise customers. View our pricing for more information.

To get started, follow these tutorials:

User management - to add collaborators with different access rights.
Upload portals - to allow external parties to securely contribute data to your datasets.
Custom blocks - to match any specific use cases using dedicated cloud jobs.
Research data - to explain how to deal with such complex data infrastructure.

Users

Within an organization you can work on one or more projects with multiple people. These can be colleagues, outside researchers, or even members of the community. They will only get access to the specific data in the project, and not to any of the raw data in your organizational datasets.

Only available for enterprise customers

Organizational features are only available for enterprise customers. View our pricing for more information.

To invite a user in an organization, click on the "Add user button, enter the email address and select the role:

Each one of the users can have different roles:

Admins have full rights on the organization
Members have full access on the datasets, custom blocks but cannot join a project without being invited
Guests have only limited access to the selected datasets

To give someone access, go to your project's dashboard, and find the "Collaborators" widget. Click the '+' icon, and type the username or e-mail address of the other user. This user needs to have an Edge Impulse account already.

Upload portals

Upload portals are a secure way to let external parties upload data to your datasets. Through an upload portal they get an easy user interface to add data, but they have no access to the content of the dataset, nor can they delete any files. Data that is uploaded through the portal can be stored on-premise or in your own cloud infrastructure.

In this tutorial we'll set up an upload portal, show you how to add new data, and how to show this data in Edge Impulse for further processing.

Only available for enterprise customers

Organizational features are only available for enterprise customers. View our pricing for more information.

1. Configuring a storage bucket

Data is stored in storage buckets, which can either be hosted by Edge Impulse, or in your own infrastructure.

2. Creating an upload portal

With your storage bucket configured you're ready to set up your first upload portal. In your organization go to Data > Upload portals and choose Create new upload portal. Here, select a name, a description, the storage bucket, and a path in the storage bucket.

Note: You'll need to enable CORS headers on the bucket. If these are not configured you'll get prompted with instructions. Talk to your user success engineer (when your data is hosted by Edge Impulse), or your system administrator to configure this.

After your portal is created a link is shown. This link contains an authentication token, and can be shared directly with the third party.

Click the link to open the portal. If you ever forget the link: no worries. Click the ⋮ next to your portal, and choose View portal.

3. Uploading data to the portal

To upload data you can now drag & drop files or folders to the drop zone on the right, or use Create new folder to first create a folder structure. There's no limit to the amount of files you can upload here, and all files are hashed, so if you upload a file that's already present the file will be skipped.

Note: Files with the same name but with a different hash are overwritten.

4. Using your portal in transformation blocks / research data

If you want to process data in a portal as part of a Research Pipeline you can either:

Mount the portal directly into a transformation block via Custom blocks > Transformation blocks > Edit block, and select the portal under mount points.
Mount the bucket that the portal is in, as a transformation block. This will also give you access to all other data in the bucket, very useful if you need to sync other data (see Synchronizing research data with a bucket).

5. Adding the data to your project

If the data in your portal is already in the right format you can also directly import the uploaded data to your project. In your project view, go to Data Sources, **** select 'Upload portal' and follow the steps of the wizard:

6. Recap

Any questions, or interested in the enterprise version of Edge Impulse? Contact us for more information.

Appendix A: Programmatic access to portals

Here's a Python script which uploads, lists and downloads data to a portal. To upload data you'll need to authenticate with a JWT token, see below this script for more info.

# portal_api.py

import requests
import json
import os
import hashlib

PORTAL_TOKEN = os.environ.get('EI_PORTAL_TOKEN')
PORTAL_ID = os.environ.get('EI_PORTAL_ID')
JWT_TOKEN = os.environ.get('EI_JWT_TOKEN')

if not PORTAL_TOKEN:
    print('Missing EI_PORTAL_TOKEN environmental variable.')
    print('Go to a portal, and copy the part after "?token=" .')
    print('Then run:')
    print('    export EI_PORTAL_TOKEN=ec61e...')
    print('')
    print('You can add the line above to your ~/.bashrc or ~/.zshrc file to automatically load the token in the future.')
    exit(1)

if not PORTAL_ID:
    print('Missing EI_PORTAL_ID environmental variable.')
    print('Go to a portal, open the browser console, and look for "portalId" in the "Hello world from Edge Impulse" object to find it.')
    print('Then run:')
    print('    export EI_PORTAL_ID=122')
    print('')
    print('You can add the line above to your ~/.bashrc or ~/.zshrc file to automatically load the token in the future.')
    exit(1)

if not JWT_TOKEN:
    print('WARN: Missing EI_JWT_TOKEN environmental variable, you will only have write-only access to the portal')
    print('Run `python3 get_jwt_token.py` for instructions on how to set the token')
    print('(this requires access to the organization that owns the portal)')

def get_file_hash(path):
    with open(path, 'rb') as f:
        return hashlib.md5(f.read()).hexdigest()

def create_upload_link(file_name_in_portal, path):
    url = "https://studio.edgeimpulse.com/v1/api/portals/" + PORTAL_ID + "/upload-link"

    payload = json.dumps({
        'fileName': file_name_in_portal,
        "fileSize": os.path.getsize(path),
        "fileHash": get_file_hash(path)
    })
    headers = {
        'accept': "application/json",
        'content-type': "application/json",
        'x-token': PORTAL_TOKEN
    }

    response = requests.request("POST", url, data=payload, headers=headers)

    if (response.status_code != 200):
        raise Exception('status code was not 200, but ' + str(response.status_code) + ' - ' + response.text)

    j = response.json()
    if (not j['success']):
        raise Exception('api request did not succeed ' + str(response.status_code) + ' - ' + response.text)

    return j['url']

def upload_file_to_s3(signed_url, path):
    with open(path, 'rb') as f:
        response = requests.request("PUT", signed_url, data=f, headers={})

        if (response.status_code != 200):
            raise Exception('status code was not 200, but ' + str(response.status_code) + ' - ' + response.text)

def upload_file_to_portal(file_name_in_portal, path):
    print('Uploading', file_name_in_portal + '...')
    link = create_upload_link(file_name_in_portal, path)
    upload_file_to_s3(link, path)
    print('Uploading', file_name_in_portal, 'OK')
    print('')


def create_download_link(file_name_in_portal):
    url = "https://studio.edgeimpulse.com/v1/api/portals/" + PORTAL_ID + "/files/download"

    payload = json.dumps({
        'path': file_name_in_portal,
    })
    headers = {
        'accept': "application/json",
        'content-type': "application/json",
        'x-token': PORTAL_TOKEN,
        'x-jwt-token': JWT_TOKEN
    }

    response = requests.request("POST", url, data=payload, headers=headers)

    if (response.status_code != 200):
        raise Exception('status code was not 200, but ' + str(response.status_code) + ' - ' + response.text)

    j = response.json()
    if (not j['success']):
        raise Exception('api request did not succeed ' + str(response.status_code) + ' - ' + response.text)

    return j['url']

def download_file_from_s3(signed_url):
    response = requests.request("GET", signed_url, headers={})

    if (response.status_code != 200):
        raise Exception('status code was not 200, but ' + str(response.status_code) + ' - ' + response.text)

    return response.content

def download_file_from_portal(file_name_in_portal):
    print('Downloading', file_name_in_portal + '...')
    link = create_download_link(file_name_in_portal)
    f = download_file_from_s3(link)
    print('Downloading', file_name_in_portal, 'OK')
    print('')
    return f


def list_files_in_portal(prefix):
    url = "https://studio.edgeimpulse.com/v1/api/portals/" + PORTAL_ID + "/files"

    payload = json.dumps({
        'prefix': prefix,
    })
    headers = {
        'accept': "application/json",
        'content-type': "application/json",
        'x-token': PORTAL_TOKEN
    }

    response = requests.request("POST", url, data=payload, headers=headers)

    if (response.status_code != 200):
        raise Exception('status code was not 200, but ' + str(response.status_code) + ' - ' + response.text)

    j = response.json()
    if (not j['success']):
        raise Exception('api request did not succeed ' + str(response.status_code) + ' - ' + response.text)

    return j['files']

# this is how you upload files to a portal using the Edge Impulse API
# first argument is the path in the portal, second is the location of the file
upload_file_to_portal('test.jpg', '/Users/janjongboom/Downloads/test.jpg')

# uploading to a subdirectory
upload_file_to_portal('flowers/daisy.jpg', '/Users/janjongboom/Downloads/daisy-resized.jpg')

# listing files
print('files in root folder', list_files_in_portal(''))
print('files in "flowers/"', list_files_in_portal('flowers/'))

# downloading a file
if JWT_TOKEN:
    buffer = download_file_from_portal('flowers/daisy.jpg')
    with open('output.jpg', 'wb') as f:
        f.write(buffer)
else:
    print('Not downloading files, EI_JWT_TOKEN not set')

print('Done!')

And here's a script to generate JWT tokens:

# get_jwt_token.py

import requests
import json
import argparse

def get_token(username, password):
    url = "https://studio.edgeimpulse.com/v1/api-login"

    payload = json.dumps({
        'username': username,
        "password": password,
    })
    headers = {
        'accept': "application/json",
        'content-type': "application/json",
    }

    response = requests.request("POST", url, data=payload, headers=headers)

    if (response.status_code != 200):
        raise Exception('status code was not 200, but ' + str(response.status_code) + ' - ' + response.text)

    j = response.json()
    if (not j['success']):
        raise Exception('api request did not succeed ' + str(response.status_code) + ' - ' + response.text)

    return j['token']


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Get Edge Impulse JWT token')
    parser.add_argument('--username', type=str, required=True, help="Username or email address")
    parser.add_argument('--password', type=str, required=True)

    args, unknown = parser.parse_known_args()

    token = get_token(args.username, args.password)

    print('JWT token is:', token)
    print('')
    print('Use this in portal_api.py via:')
    print('    export EI_JWT_TOKEN=' + token)
    print('')
    print('You can add the line above to your ~/.bashrc or ~/.zshrc file to automatically load the token in the future.')
    print('Note: This token is valid for a limited time!')

Custom blocks

Custom blocks are cloud jobs that can be hosted and used on Edge Impulse. They serve a dedicated task, are extremely flexible, let you customize your experience and fasten your time-to-market.

Creating a transformation block - to fetch, sort, validate, combine and transform existing data into robust datasets that can be imported into your projects.
Building deployment blocks - to create custom deployment targets for your products.
Building and hosting custom DSP blocks - to create and host your custom signal processing techniques and use them directly in your projects.
Create a custom learning block - to use your custom models and load pre-trained weights with PyTorch, Keras or scikit-learn.

Research data

Since the creation of Edge Impulse, we have been helping customers to deal with complex data pipelines, complex data transformation methods and complex clinical validation studies.

In most cases, before even thinking about machine learning algorithms, researchers need to build quality datasets from real-world data. These data come from various devices (prototype devices being developed vs clinical/industrial-grade reference devices), have different formats (excel sheets, images, csv, json, etc...), and be stored in various places (researcher computer, Dropbox folder, Google Drive, S3 buckets, etc...).

Dealing with such complex data infrastructure is time-consuming and expensive to develop and maintain. With this Research data section, we want to help you understand how to create a full research data pipeline by:

We have built a health reference design that describes an end-to-end ML workflow for building a wearable health product using Edge Impulse. It covers an activity study in a research lab, where data is recorded from the wearable end device (PPG + accelerometer), a reference device (Polar H10 HR monitor), plus labels (e.g. sitting, running, biking). The data is collected and validated, then written to a research dataset in an Edge Impulse organization, and finally imported into an Edge Impulse project where we train a classifier.

It handles data coming from multiple sources, data alignment, and a multi-stage pipeline before the data is imported into an Edge Impulse project. We won't cover in detail all the code snippets, our solution engineers can help you set this end-to-end ML workflow.

Synchronizing research data with a bucket

In this section, we will show how to synchronize research data with a bucket in your organizational dataset. The goal of this step is to gather data from different sources and sort them to obtain a sorted dataset (that we will then validate in the next section).

Only available for enterprise customers

Organizational features are only available for enterprise customers. View our pricing for more information.

The reference design described in the research data consists of 10 subjects performing 1.5 - 2 hours of activities in a research lab. Participants have a study ID (e.g. AMS_001) that is used to refer to the participant. For each participant we have 4 CSV files:

accelerometer.csv - data from the wearable end device.
ppg.csv - data from the wearable end device.
polar_h10.csv - reference data from a commercial reference device (Polar H10).
labels.csv - labels of the activity, as recorded by the research lab.

We've mimicked a proper research study, and have split the data up into two locations.

accelerometer.csv / ppg.csv - live in the company data lake in S3. The data lake uses an internal structure with non-human readable IDs for each participant (e.g. 2E93ZX for anonymized data):
```
7HAIGO
|_ accelerometer.csv
|_ ppg.csv
Z0ZPJW
|_ accelerometer.csv
|_ ppg.csv
```
polar_h10.csv / labels.csv are uploaded by the research partner to an upload portal. The files are prefixed with the study ID:

To create the mapping between the study ID and the internal data lake ID we use a study master sheet. It contains information about all participants, ID mapping, and metadata. E.g.:

Subject	    Internal ID	    Study date	    Age	    BMI
AMS_001	    7HAIGO      	2022-03-10	    24	    18
AMS_002	    Z0ZPJW      	2022-01-27	    35	    31

Notes: This master sheet was made using a Google Sheet but can be anything. All data (data lake, portal, output) are hosted in an Edge Impulse S3 bucket but can be stored anywhere (see below).

Configuring a storage bucket for your dataset

Data is stored in storage buckets, which can either be hosted by Edge Impulse, or in your own infrastructure. If you choose to host the data yourself your infrastructure should be available through the S3 API, and you are responsible for setting up proper backups. To configure a new storage bucket, head to your organization, choose Data > Buckets, click Add new bucket, and fill in your access credentials. Our solution engineers are also here to help you set up the buckets for you.

About datasets

With the storage bucket in place you can create your first dataset. Datasets in Edge Impulse have three layers:

The dataset, a larger set of data items, grouped together.
Data item, an item with metadata and files attached.
Data file, the actual files.

No required format for data files

There is no required format for data files. You can upload data in any format, whether it's CSV, Parquet, or a proprietary data format.

Adding research data to your organization

There are three ways of uploading data into your organization. You can either:

Upload data directly to the storage bucket (recommended method). In this case use Add data... > Add dataset from bucket and the data will be discovered automatically.
Upload data through the Edge Impulse API.
Upload the files through the Upload Portals.

Sorter and combiner

Sorter

The sorter is the first step of the research pipeline. It's job is to fetch the data from all locations (here: internal data lake, portal, metadata from study master sheet) and create a research dataset in Edge Impulse. It does this by:

Creating a new structure in S3 like this:

AMS_001
|_ AMS_001_labels.csv
|_ AMS_001_polar_h10.csv
|_ accelerometer.csv
|_ ppg.csv
AMS_002
|_ AMS_002_labels.csv
|_ AMS_002_polar_h10.csv
|_ accelerometer.csv
|_ ppg.csv

Syncing the S3 folder with a research dataset in your Edge Impulse organization (like AMS Activity Study 2022).
Updating the metadata with the metadata from the master sheet (Age, BMI, etc...).

Combiner

With the data sorted we then:

Need to verify that the data is correct (see validate your research data)
Combine the data into a single Parquet file. This is essentially the contract we have for our dataset. By settling on a standard format (strong typed, same column names everywhere) this data is now ready to be used for ML, new algorithm development, etc. Because we also add metadata for each file here we're very quickly building up a valuable R&D datastore.

All these steps can be run through different transformation blocks and executed one after the other using data pipelines.

Validating research data

Only available for enterprise customers

Using Checklists

You can optionally show a check mark in the list of data items, and show a check list for data items. This can be used to quickly view which data items are complete (if you need to capture data from multiple sources) or whether items are in the right format.

Checklists look trivial, but are actually very powerful as they give quick insights in dataset issues. Missing these issues until after the study is done can be super expensive.

Checklists are written to ei-metadata.json and are automatically being picked up by the UI.

Checklists are driven by the metadata for a data item. Set the ei_check metadata item to either 0 or 1 to show a check mark in the list. Set an ei_check_KEYNAME metadata item to 0 or 1 to show the item in the check list.

To query for items with or without a check mark, use a filter in the form of:

Example

For the reference design described and used in the previous pages, the combiner takes in a data item, and writes out:

A checklist, e.g.:
- ✔ - PPG file present
- ✔ - Accelerometer file present
- ✘ - Correlation between Polar/PPG HR is at least 0.5
If the checklist is OK, a combined.parquet file.
A hr.png file with the correlation between HR found from PPG, and HR from the reference device. This is useful for two reasons:
- If the correlation is too low we're looking at the wrong file, or data is missing.
- Verify if the PPG => HR algorithm actually works.

Querying research data

Organizational datasets contain a powerful query system which lets you explore and slice data. You control the query system through the 'Filter' text box, and you use a language which is very similar to SQL (documentation).

Only available for enterprise customers

Organizational features are only available for enterprise customers. View our pricing for more information.

For example, here are some queries that you can make:

dataset like '%AMS Activity Study%' - returns all items and files from the study.
bucket_name = 'edge-impulse-health-reference-design' AND --labels sitting,walking - returns data whose label is 'sitting' and 'walking, and that is stored in the 'edge-impulse-health-reference-design' bucket.
metadata->ei_check = 0 - return data that have a metadata field 'ei_check' which is '0'.
created > DATE('2022-08-01') - returns all data that was created after Aug 1, 2022.

After you've created a filter, you can select one or more data items, and select Actions...>Download selected to create a ZIP file with the data files. The file count reflects the number of files returned by the filter.

The previous queries all returned all files for a data item. But you can also query files through the same filter. In that case the data item will be returned, but only with the files selected. For example:

file_name LIKE '%.png' - returns all files that end with .png.

If you have an interesting query that you'd like to share with your colleagues, you can just share the URL. The query is already added to it automatically.

All available fields

These are all the available fields in the query interface:

dataset - Dataset.
bucket_id - Bucket ID.
bucket_name - Bucket name.
bucket_path - Path of the data item within the bucket.
id - Data item ID.
name - Data item name.
total_file_count - Number of files for the data item.
total_file_size - Total size of all files for the data item.
created - When the data item was created.
metadata->key - Any item listed under 'metadata'.
file_name - Name of a file.
file_names - All filenames in the data item, that you can use in conjunction with CONTAINS. E.g. find all items with file X, but not file Y: file_names CONTAINS 'x' AND not file_names CONTAINS 'y'.

Transforming research data

Only available for enterprise customers

1. Prerequisites

You'll need:

- If you receive any warnings that's fine. Run edge-impulse-blocks afterwards to verify that the CLI was installed correctly.

Transformation blocks use Docker containers, a virtualization technique which lets developers package up an application with all dependencies in a single package. If you want to test your blocks locally you'll also need (this is not a requirement):

1.1 - Parquet schema

This is the Parquet schema for the gestures.parquet file which we'll transform:

2. Building your first transformation block

To build a transformation block open a command prompt or terminal window, create a new folder, and run:

This will prompt you to log in, and enter the details for your block. E.g.:

Then, create the following files in this directory:

2.1 - Dockerfile

We're building a Python based transformation block. The Dockerfile describes our base image (Python 3.7.5), our dependencies (in requirements.txt) and which script to run (transform.py).

Note: Do not use a WORKDIR under /home! The /home path will be mounted in by Edge Impulse, making your files inaccessible.

ENTRYPOINT vs RUN / CMD

If you use a different programming language, make sure to use ENTRYPOINT to specify the application to execute, rather than RUN or CMD.

2.2 - requirements.txt

This file describes the dependencies for the block. We'll be using pandas and pyarrow to parse the Parquet file, and numpy to do some calculations.

2.3 - transform.py

This file includes the actual application. Transformation blocks are invoked with three parameters (as command line arguments):

--in-file or --in-directory - A file (if the block operates on a file), or a directory (if the block operates on a data item) from the organizational dataset. In this case the gestures.parquet file.
--out-directory - Directory to write files to.
--hmac-key - You can use this HMAC key to sign the output files. This is not used in this tutorial.
--metadata - Key/value pairs containing the metadata for the data item, plus additional metadata about the data item in the dataItemInfo key. E.g.: { "subject": "AAA001", "ei_check": "1", "dataItemInfo": { "id": 101, "dataset": "Human Activity 2022", "bucketName": "edge-impulse-tutorial", "bucketPath": "janjongboom/human_activity/AAA001/", "created": "2022-03-07T09:20:59.772Z", "totalFileCount": 14, "totalFileSize": 6347421 } }

Add the following content. This takes in the Parquet file, groups data by their label, and then calculates the RMS over the X, Y and Z axes of the accelerometer.

2.4 - Building and testing the container

On your local machine

To test the transformation block locally, if you have Python and all dependencies installed, just run:

Docker

You can also build the container locally via Docker, and test the block. The added benefit is that you don't need any dependencies installed on your local computer, and can thus test that you've included everything that's needed for the block. This requires Docker desktop to be installed.

To build the container and test the block, open a command prompt or terminal window and navigate to the source directory. First, build the container:

Then, run the container (make sure gestures.parquet is in the same directory):

Seeing the output

This process has generated a new Parquet file in the out/ directory containing the RMS of the X, Y and Z axes. If you inspect the content of the file (e.g. using parquet-tools) you'll see the output:

Success!

3. Pushing the transformation block to Edge Impulse

With the block ready we can push it to your organization. Open a command prompt or terminal window, navigate to the folder you created earlier, and run:

This packages up your folder, sends it to Edge Impulse where it'll be built, and finally is added to your organization.

The transformation block is now available in Edge Impulse under Data transformation > Transformation blocks.

If you make any changes to the block, just re-run edge-impulse-blocks push and the block will be updated.

4. Uploading gestures.parquet to Edge Impulse

Next, upload the gestures.parquet file, by going to Data > Add data... > Add data item, setting name as 'Gestures', dataset to 'Transform tutorial', and selecting the Parquet file.

This makes the gestures.parquet file available from the Data page.

5. Starting the transformation

With the Parquet file in Edge Impulse and the transformation block configured you can now create a new job. Go to Data, and select the Parquet file by setting the filter to dataset = 'Transform tutorial'.

Click the checkbox next to the data item, and select Transform selected (1 file). On the 'Create transformation job' page select 'Import data into Dataset'. Under 'output dataset', select 'Same dataset as source', and under 'Transformation block' select the new transformation block.

Click Start transformation job to start the job. This pulls the data in, starts a transformation job and finally uploads the data back to your dataset. If you have multiple files selected the transformations will also run in parallel.

You can now find the transformed file back in your dataset:

6. Next steps

Appendix: Advanced features

Updating metadata from a transformation block

You can update the metadata of blocks directly from a transformation block by creating a ei-metadata.json file in the output directory. The metadata is then applied to the new data item automatically when the transform job finishes. The ei-metadata.json file has the following structure:

Some notes:

If action is set to add the metadata keys are added to the data item. If action is set to replace all existing metadata keys are removed.

Environmental variables

Transformation blocks get access to the following environmental variables, which let you authenticate with the Edge Impulse API. This way you don't have to inject these credentials into the block. The variables are:

EI_API_KEY - an API key with 'member' privileges for the organization.
EI_ORGANIZATION_ID - the organization ID that the block runs in.
EI_API_ENDPOINT - the API endpoint (default: https://studio.edgeimpulse.com/v1).

Buildling data pipelines

Only available for enterprise customers

The examples in the screenshots below shows how to create and use a pipeline to create the 'AMS Activity 2022' dataset.

Create a pipeline

To create a new pipeline, click on '+Add a new pipeline:

Get the steps from your transformation blocks

In your organization workspace, go to Custom blocks -> Transformation and select Run job on the job you want to add.

Select Copy as pipeline step and paste it to the configuration json file.

You can then paste the copied step directly to the respected field.

Below, you have an option to feed the data to either a organisation dataset or an Edge Impulse project

Schedule and notify

By default, your pipeline will run every day. To schedule your pipeline jobs, click on the ⋮ button and select Edit pipeline.

Once the pipeline has successfully finished, it can send an email to the Users to notify.

Run the pipeline

Once your pipeline is set, you can run it directly from the UI, from external sources or by scheduling the task.

Run the pipeline from the UI

To run your pipeline from Edge Impulse studio, click on the ⋮ button and select Run pipeline now.

Run the pipeline from code

To run your pipeline from Edge Impulse studio, click on the ⋮ button and select Run pipeline from code. This will display an overlay with curl, Node.js and Python code samples.

You will need to create an API key to run the pipeline from code.

Webhooks

Another useful feature is to create a webhook to call a URL when the pipeline has ran. It will run a POST request containing the following information:

Upload portals

In this tutorial we'll set up an upload portal, show you how to add new data, and how to show this data in Edge Impulse for further processing.

Only available for enterprise customers

Organizational features are only available for enterprise customers. View our pricing for more information.

1. Configuring a storage bucket

Data is stored in storage buckets, which can either be hosted by Edge Impulse, or in your own infrastructure.

2. Creating an upload portal

After your portal is created a link is shown. This link contains an authentication token, and can be shared directly with the third party.

Click the link to open the portal. If you ever forget the link: no worries. Click the ⋮ next to your portal, and choose View portal.

3. Uploading data to the portal

Note: Files with the same name but with a different hash are overwritten.

4. Using your portal in transformation blocks / research data

If you want to process data in a portal as part of a Research Pipeline you can either:

Mount the portal directly into a transformation block via Custom blocks > Transformation blocks > Edit block, and select the portal under mount points.
Mount the bucket that the portal is in, as a transformation block. This will also give you access to all other data in the bucket, very useful if you need to sync other data (see Synchronizing research data with a bucket).

5. Adding the data to your project

6. Recap

If you need a secure way for external parties to contribute data to your datasets then upload portals are the way to go. They offer a friendly user interface, upload data directly into your storage buckets, and give you an easy way to use the data directly in Edge Impulse.

Any questions, or interested in the enterprise version of Edge Impulse? Contact us for more information.

Appendix A: Programmatic access to portals

Here's a Python script which uploads, lists and downloads data to a portal. To upload data you'll need to authenticate with a JWT token, see below this script for more info.

# portal_api.py

import requests
import json
import os
import hashlib

PORTAL_TOKEN = os.environ.get('EI_PORTAL_TOKEN')
PORTAL_ID = os.environ.get('EI_PORTAL_ID')
JWT_TOKEN = os.environ.get('EI_JWT_TOKEN')

if not PORTAL_TOKEN:
    print('Missing EI_PORTAL_TOKEN environmental variable.')
    print('Go to a portal, and copy the part after "?token=" .')
    print('Then run:')
    print('    export EI_PORTAL_TOKEN=ec61e...')
    print('')
    print('You can add the line above to your ~/.bashrc or ~/.zshrc file to automatically load the token in the future.')
    exit(1)

if not PORTAL_ID:
    print('Missing EI_PORTAL_ID environmental variable.')
    print('Go to a portal, open the browser console, and look for "portalId" in the "Hello world from Edge Impulse" object to find it.')
    print('Then run:')
    print('    export EI_PORTAL_ID=122')
    print('')
    print('You can add the line above to your ~/.bashrc or ~/.zshrc file to automatically load the token in the future.')
    exit(1)

if not JWT_TOKEN:
    print('WARN: Missing EI_JWT_TOKEN environmental variable, you will only have write-only access to the portal')
    print('Run `python3 get_jwt_token.py` for instructions on how to set the token')
    print('(this requires access to the organization that owns the portal)')

def get_file_hash(path):
    with open(path, 'rb') as f:
        return hashlib.md5(f.read()).hexdigest()

def create_upload_link(file_name_in_portal, path):
    url = "https://studio.edgeimpulse.com/v1/api/portals/" + PORTAL_ID + "/upload-link"

    payload = json.dumps({
        'fileName': file_name_in_portal,
        "fileSize": os.path.getsize(path),
        "fileHash": get_file_hash(path)
    })
    headers = {
        'accept': "application/json",
        'content-type': "application/json",
        'x-token': PORTAL_TOKEN
    }

    response = requests.request("POST", url, data=payload, headers=headers)

    if (response.status_code != 200):
        raise Exception('status code was not 200, but ' + str(response.status_code) + ' - ' + response.text)

    j = response.json()
    if (not j['success']):
        raise Exception('api request did not succeed ' + str(response.status_code) + ' - ' + response.text)

    return j['url']

def upload_file_to_s3(signed_url, path):
    with open(path, 'rb') as f:
        response = requests.request("PUT", signed_url, data=f, headers={})

        if (response.status_code != 200):
            raise Exception('status code was not 200, but ' + str(response.status_code) + ' - ' + response.text)

def upload_file_to_portal(file_name_in_portal, path):
    print('Uploading', file_name_in_portal + '...')
    link = create_upload_link(file_name_in_portal, path)
    upload_file_to_s3(link, path)
    print('Uploading', file_name_in_portal, 'OK')
    print('')


def create_download_link(file_name_in_portal):
    url = "https://studio.edgeimpulse.com/v1/api/portals/" + PORTAL_ID + "/files/download"

    payload = json.dumps({
        'path': file_name_in_portal,
    })
    headers = {
        'accept': "application/json",
        'content-type': "application/json",
        'x-token': PORTAL_TOKEN,
        'x-jwt-token': JWT_TOKEN
    }

    response = requests.request("POST", url, data=payload, headers=headers)

    if (response.status_code != 200):
        raise Exception('status code was not 200, but ' + str(response.status_code) + ' - ' + response.text)

    j = response.json()
    if (not j['success']):
        raise Exception('api request did not succeed ' + str(response.status_code) + ' - ' + response.text)

    return j['url']

def download_file_from_s3(signed_url):
    response = requests.request("GET", signed_url, headers={})

    if (response.status_code != 200):
        raise Exception('status code was not 200, but ' + str(response.status_code) + ' - ' + response.text)

    return response.content

def download_file_from_portal(file_name_in_portal):
    print('Downloading', file_name_in_portal + '...')
    link = create_download_link(file_name_in_portal)
    f = download_file_from_s3(link)
    print('Downloading', file_name_in_portal, 'OK')
    print('')
    return f


def list_files_in_portal(prefix):
    url = "https://studio.edgeimpulse.com/v1/api/portals/" + PORTAL_ID + "/files"

    payload = json.dumps({
        'prefix': prefix,
    })
    headers = {
        'accept': "application/json",
        'content-type': "application/json",
        'x-token': PORTAL_TOKEN
    }

    response = requests.request("POST", url, data=payload, headers=headers)

    if (response.status_code != 200):
        raise Exception('status code was not 200, but ' + str(response.status_code) + ' - ' + response.text)

    j = response.json()
    if (not j['success']):
        raise Exception('api request did not succeed ' + str(response.status_code) + ' - ' + response.text)

    return j['files']

# this is how you upload files to a portal using the Edge Impulse API
# first argument is the path in the portal, second is the location of the file
upload_file_to_portal('test.jpg', '/Users/janjongboom/Downloads/test.jpg')

# uploading to a subdirectory
upload_file_to_portal('flowers/daisy.jpg', '/Users/janjongboom/Downloads/daisy-resized.jpg')

# listing files
print('files in root folder', list_files_in_portal(''))
print('files in "flowers/"', list_files_in_portal('flowers/'))

# downloading a file
if JWT_TOKEN:
    buffer = download_file_from_portal('flowers/daisy.jpg')
    with open('output.jpg', 'wb') as f:
        f.write(buffer)
else:
    print('Not downloading files, EI_JWT_TOKEN not set')

print('Done!')

And here's a script to generate JWT tokens:

# get_jwt_token.py

import requests
import json
import argparse

def get_token(username, password):
    url = "https://studio.edgeimpulse.com/v1/api-login"

    payload = json.dumps({
        'username': username,
        "password": password,
    })
    headers = {
        'accept': "application/json",
        'content-type': "application/json",
    }

    response = requests.request("POST", url, data=payload, headers=headers)

    if (response.status_code != 200):
        raise Exception('status code was not 200, but ' + str(response.status_code) + ' - ' + response.text)

    j = response.json()
    if (not j['success']):
        raise Exception('api request did not succeed ' + str(response.status_code) + ' - ' + response.text)

    return j['token']


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Get Edge Impulse JWT token')
    parser.add_argument('--username', type=str, required=True, help="Username or email address")
    parser.add_argument('--password', type=str, required=True)

    args, unknown = parser.parse_known_args()

    token = get_token(args.username, args.password)

    print('JWT token is:', token)
    print('')
    print('Use this in portal_api.py via:')
    print('    export EI_JWT_TOKEN=' + token)
    print('')
    print('You can add the line above to your ~/.bashrc or ~/.zshrc file to automatically load the token in the future.')
    print('Note: This token is valid for a limited time!')

Synchronizing research data with a bucket

Only available for enterprise customers

Organizational features are only available for enterprise customers. View our pricing for more information.

accelerometer.csv - data from the wearable end device.
ppg.csv - data from the wearable end device.
polar_h10.csv - reference data from a commercial reference device (Polar H10).
labels.csv - labels of the activity, as recorded by the research lab.

We've mimicked a proper research study, and have split the data up into two locations.

accelerometer.csv / ppg.csv - live in the company data lake in S3. The data lake uses an internal structure with non-human readable IDs for each participant (e.g. 2E93ZX for anonymized data):
```
7HAIGO
|_ accelerometer.csv
|_ ppg.csv
Z0ZPJW
|_ accelerometer.csv
|_ ppg.csv
```
polar_h10.csv / labels.csv are uploaded by the research partner to an upload portal. The files are prefixed with the study ID:

To create the mapping between the study ID and the internal data lake ID we use a study master sheet. It contains information about all participants, ID mapping, and metadata. E.g.:

Subject	    Internal ID	    Study date	    Age	    BMI
AMS_001	    7HAIGO      	2022-03-10	    24	    18
AMS_002	    Z0ZPJW      	2022-01-27	    35	    31

Notes: This master sheet was made using a Google Sheet but can be anything. All data (data lake, portal, output) are hosted in an Edge Impulse S3 bucket but can be stored anywhere (see below).

Configuring a storage bucket for your dataset

About datasets

With the storage bucket in place you can create your first dataset. Datasets in Edge Impulse have three layers:

The dataset, a larger set of data items, grouped together.
Data item, an item with metadata and files attached.
Data file, the actual files.

No required format for data files

There is no required format for data files. You can upload data in any format, whether it's CSV, Parquet, or a proprietary data format.

Adding research data to your organization

There are three ways of uploading data into your organization. You can either:

Upload data directly to the storage bucket (recommended method). In this case use Add data... > Add dataset from bucket and the data will be discovered automatically.
Upload data through the Edge Impulse API.
Upload the files through the Upload Portals.

Sorter and combiner

Sorter

Creating a new structure in S3 like this:

AMS_001
|_ AMS_001_labels.csv
|_ AMS_001_polar_h10.csv
|_ accelerometer.csv
|_ ppg.csv
AMS_002
|_ AMS_002_labels.csv
|_ AMS_002_polar_h10.csv
|_ accelerometer.csv
|_ ppg.csv

Syncing the S3 folder with a research dataset in your Edge Impulse organization (like AMS Activity Study 2022).
Updating the metadata with the metadata from the master sheet (Age, BMI, etc...).

Combiner

With the data sorted we then:

Need to verify that the data is correct (see validate your research data)
Combine the data into a single Parquet file. This is essentially the contract we have for our dataset. By settling on a standard format (strong typed, same column names everywhere) this data is now ready to be used for ML, new algorithm development, etc. Because we also add metadata for each file here we're very quickly building up a valuable R&D datastore.

All these steps can be run through different transformation blocks and executed one after the other using data pipelines.

Transforming research data

Transformation blocks take raw data from your and convert the data into a different dataset or files that can be loaded in an Edge Impulse project. You can use transformation blocks to only include certain parts of individual data files, calculate long-running features like a running mean or derivatives, or efficiently generate features with different window lengths. Transformation blocks can be written in any language, and run on the Edge Impulse infrastructure.

In this tutorial we build a Python-based transformation block that loads Parquet files, calculates features from the Parquet file, and then writes a new file back to your dataset. If you haven't done so, go through first.

Only available for enterprise customers

Organizational features are only available for enterprise customers. for more information.

1. Prerequisites

You'll need:

The .
- If you receive any warnings that's fine. Run edge-impulse-blocks afterwards to verify that the CLI was installed correctly.
The file which you can use to test the transformation block. This contains some data from the dataset in Parquet format.

installed on your machine.

1.1 - Parquet schema

This is the Parquet schema for the gestures.parquet file which we'll transform:

message root {
  required binary sampleName (UTF8);
  required int64 timestamp (TIMESTAMP_MILLIS);
  required int64 added (TIMESTAMP_MILLIS);
  required boolean signatureValid;
  required binary device (UTF8);
  required binary label (UTF8);
  required float accX;
  required float accY;
  required float accZ;
}

2. Building your first transformation block

To build a transformation block open a command prompt or terminal window, create a new folder, and run:

$ edge-impulse-blocks init

This will prompt you to log in, and enter the details for your block. E.g.:

Edge Impulse Blocks v1.9.0
? What is your user name or e-mail address (edgeimpulse.com)? jan+demo@edgeimpulse.com
? What is your password? [hidden]
Attaching block to organization 'Demo org Inc.'
? Choose a type of block Transformation block
? Choose an option Create a new block
? Enter the name of your block Demo dataset transformation
? Enter the description of your block Reads a Parquet file, extracts features, and writes the block back to the dataset
Creating block with config: {
  name: 'Demo dataset transformation',
  type: 'transform',
  description: 'Reads a Parquet file and splits it up in labeled data',
  organizationId: 34
}
Your new block 'Demo dataset transformation' has been created in '~/repos/tutorial-processing-block'.
When you have finished building your transformation block, run "edge-impulse-blocks push" to update the block in Edge Impulse.

Then, create the following files in this directory:

2.1 - Dockerfile

We're building a Python based transformation block. The Dockerfile describes our base image (Python 3.7.5), our dependencies (in requirements.txt) and which script to run (transform.py).

FROM python:3.7.5-stretch

WORKDIR /app

# Python dependencies
COPY requirements.txt ./
RUN pip3 --no-cache-dir install -r requirements.txt

COPY . ./

ENTRYPOINT [ "python3",  "transform.py" ]

Note: Do not use a WORKDIR under /home! The /home path will be mounted in by Edge Impulse, making your files inaccessible.

ENTRYPOINT vs RUN / CMD

If you use a different programming language, make sure to use ENTRYPOINT to specify the application to execute, rather than RUN or CMD.

2.2 - requirements.txt

This file describes the dependencies for the block. We'll be using pandas and pyarrow to parse the Parquet file, and numpy to do some calculations.

numpy==1.16.4
pandas==0.23.4
pyarrow==0.16.0

2.3 - transform.py

This file includes the actual application. Transformation blocks are invoked with three parameters (as command line arguments):

--in-file or --in-directory - A file (if the block operates on a file), or a directory (if the block operates on a data item) from the organizational dataset. In this case the gestures.parquet file.
--out-directory - Directory to write files to.
--hmac-key - You can use this HMAC key to sign the output files. This is not used in this tutorial.
--metadata - Key/value pairs containing the metadata for the data item, plus additional metadata about the data item in the dataItemInfo key. E.g.: { "subject": "AAA001", "ei_check": "1", "dataItemInfo": { "id": 101, "dataset": "Human Activity 2022", "bucketName": "edge-impulse-tutorial", "bucketPath": "janjongboom/human_activity/AAA001/", "created": "2022-03-07T09:20:59.772Z", "totalFileCount": 14, "totalFileSize": 6347421 } }

Add the following content. This takes in the Parquet file, groups data by their label, and then calculates the RMS over the X, Y and Z axes of the accelerometer.

import pyarrow.parquet as pq
import numpy as np
import math, os, sys, argparse, json, hmac, hashlib, time
import pandas as pd

# these are the three arguments that we get in
parser = argparse.ArgumentParser(description='Organization transformation block')
parser.add_argument('--in-file', type=str, required=True)
parser.add_argument('--out-directory', type=str, required=True)

args, unknown = parser.parse_known_args()

# verify that the input file exists and create the output directory if needed
if not os.path.exists(args.in_file):
    print('--in-file argument', args.in_file, 'does not exist', flush=True)
    exit(1)

if not os.path.exists(args.out_directory):
    os.makedirs(args.out_directory)

# load and parse the input file
print('Loading parquet file', args.in_file, flush=True)
table = pq.read_table(args.in_file)
data = table.to_pandas()

features = []

# we group by label and then extract some metrics
for label in data.label.unique():
    data_per_label = data[data.label == label]

    # calculate the RMS per axis
    features.append({
        'label': label,
        'rmsX': np.sqrt(np.mean(data_per_label.accX**2)),
        'rmsY': np.sqrt(np.mean(data_per_label.accY**2)),
        'rmsZ': np.sqrt(np.mean(data_per_label.accZ**2))
    })

# and store as new file in the output directory
out_file = os.path.join(args.out_directory, os.path.splitext(os.path.basename(args.in_file))[0] + '_features.parquet')
pd.DataFrame(features).to_parquet(out_file)

print('Written features file', out_file, flush=True)

2.4 - Building and testing the container

On your local machine

To test the transformation block locally, if you have Python and all dependencies installed, just run:

$ python3 transform.py --in-file gestures.parquet --out-directory out/

Docker

To build the container and test the block, open a command prompt or terminal window and navigate to the source directory. First, build the container:

$ docker build -t test-org-transform-parquet-dataset .

Then, run the container (make sure gestures.parquet is in the same directory):

$ docker run --rm -v $PWD:/data test-org-transform-parquet-dataset --in-file /data/gestures.parquet --out-directory /data/out

Seeing the output

$ parquet-tools head -n5 out/gestures_features.parquet 
label = wave
rmsX = 11.424144744873047
rmsY = 4.73303747177124
rmsZ = 2.944265842437744

label = updown
rmsX = 3.899503231048584
rmsY = 3.9587674140930176
rmsZ = 10.34404468536377

label = circle
rmsX = 6.263721942901611
rmsY = 7.0987162590026855
rmsZ = 6.159618854522705

label = idle
rmsX = 3.714001178741455
rmsY = 3.4940428733825684
rmsZ = 8.6710205078125

label = snake
rmsX = 1.282995581626892
rmsY = 1.8830623626708984
rmsZ = 9.597149848937988

Success!

3. Pushing the transformation block to Edge Impulse

With the block ready we can push it to your organization. Open a command prompt or terminal window, navigate to the folder you created earlier, and run:

$ edge-impulse-blocks push

This packages up your folder, sends it to Edge Impulse where it'll be built, and finally is added to your organization.

Edge Impulse Blocks v1.9.0
Archiving 'tutorial-processing-block'...
Archiving 'tutorial-processing-block' OK (2 KB) /var/folders/3r/fds0qzv914ng4t17nhh5xs5c0000gn/T/ei-transform-block-7812190951a6038c2f442ca02d428c59.tar.gz

Uploading block 'Demo dataset transformation' to organization 'Demo org Inc.'...
Uploading block 'Demo dataset transformation' to organization 'Demo org Inc.' OK

Building transformation block 'Demo dataset transformation'...
Job started
...
Building transformation block 'Demo dataset transformation' OK

Your block has been updated, go to https://studio.edgeimpulse.com/organization/34/data to run a new transformation

The transformation block is now available in Edge Impulse under Data transformation > Transformation blocks.

If you make any changes to the block, just re-run edge-impulse-blocks push and the block will be updated.

4. Uploading gestures.parquet to Edge Impulse

Next, upload the gestures.parquet file, by going to Data > Add data... > Add data item, setting name as 'Gestures', dataset to 'Transform tutorial', and selecting the Parquet file.

This makes the gestures.parquet file available from the Data page.

5. Starting the transformation

You can now find the transformed file back in your dataset:

6. Next steps

Transformation blocks are a powerful feature which let you set up a data pipeline to turn raw data into actionable machine learning features. It also gives you a reproducible way of transforming many files at once, and is programmable through the so you can automatically convert new incoming data. If you're interested in transformation blocks or any of the other enterprise features,

Appendix: Advanced features

Updating metadata from a transformation block

{
    "version": 1,
    "action": "add",
    "metadata": {
        "some-key": "some-value"
    }
}

Some notes:

If action is set to add the metadata keys are added to the data item. If action is set to replace all existing metadata keys are removed.

Environmental variables

EI_API_KEY - an API key with 'member' privileges for the organization.
EI_ORGANIZATION_ID - the organization ID that the block runs in.
EI_API_ENDPOINT - the API endpoint (default: https://studio.edgeimpulse.com/v1).