Edge Impulse Docs

Edge Impulse Documentation

Welcome to the Edge Impulse documentation. You'll find comprehensive guides and documentation to help you start working with Edge Impulse as quickly as possible, as well as support if you get stuck. Let's jump right in!

Building your first dataset

Organizational datasets allow you to build a large collection of organized sensor data that is internal to your organization. This data can then be used to create new Edge Impulse projects, imported in Pandas or Matlab for internal exploration by your data scientists, or be processed and shared with partners. Data files within the datasets can be stored on-premise or in your own cloud infrastructure.

In this tutorial we'll set up a first dataset, explore the powerful query tool, and show how to create new Edge Impuse projects from raw data.

📘

Only available for enterprise customers

Organizational features are only available for enterprise customers. Contact us for more information.

.

Prerequisites

To follow this tutorial you'll need:

  • cURL - a software package to interact with the Edge Impulse API.

1. Getting an API key

For this tutorial you'll need your organization ID and an API key. First, click on your own name, and select your organization.

Selecting your organization from the drop down menu.

This redirects you to your organizational dashboard. Your organization ID is the last part of the URL.

Finding your organization ID from the URL.

Next, you can create a new API key. Click Keys, and then click Add new API key. Give the key a name, and select 'Member' as role.

Creating a new organizational API key

Click Create API key and write down the key somewhere. It will only be shown once.

2. Configuring a storage bucket

Data is stored in storage buckets, which can either be hosted by Edge Impulse, or in your own infrastructure. If you choose to host the data yourself your infrastructure should be available through the S3 API, and you are responsible for setting up proper backups. To configure a new storage bucket, head to your organization, choose Data > Buckets, click Add new bucket, and fill in your access credentials. Make sure to name your storage bucket Internal datasets, as we'll need it to upload data later.

Storage buckets overview with a single bucket configured.

3. Uploading your first dataset

3.1 About datasets

With the storage bucket in place you can upload your first dataset. Datasets in Edge Impulse have three layers: 1) the dataset, a larger set of data items, grouped together. 2) data item, an item with metadata and files attached. 3) data file, the actual files. For example, if we're collecting data on physical activities from many subjects, we can have:

  • Dataset: 'Activities Field Study September 1994'.
    • Data item: 'Forrest Gump Running', with metadata fields "name=Forrest Gump" and "activity=running".
      • Data file: 'running01.parquet', with raw sensor data.
      • Data file: 'running02.parquet', with raw sensor data.

From here you can query and group the data. For example, you can retrieve all data from the 'Activities Field Study September 1994' dataset that was tagged with the 'running' activity. Or, you can select all the files that are smaller than 1MB and were generated by 'Forrest Gump' over all datasets.

3.2 Importing the continuous gestures dataset

For this tutorial we'll use a dataset containing 9 minutes of accelerometer data for a gesture recognition system. Download the dataset and unzip it in a convenient location.

📘

No required format for data files

There is no required format for data files. You can upload data in any format, whether it's CSV, Parquet, or a proprietary data format.

There are two ways of uploading data to your dataset. You can either upload the files directly with the UI or through the Edge Impulse API, or you can upload data directly to the storage bucket (recommended for large datasets). In the latter case, you'll need to make a call to Add new data to tell the location of your data, and any metadata. In this tutorial we'll upload data directly.

Open a terminal or a command prompt, and navigate to the folder where you extracted the dataset. Then execute the following commands (replace ei_YOUR_API_KEY with your API key, and replace ORGANIZATION_ID with your organization ID):

curl -X POST -H "x-api-key: ei_YOUR_API_KEY" \
    https://studio.edgeimpulse.com/v1/api/organizations/ORGANIZATION_ID/data/add \
    -F bucketName="Internal datasets" \
    -F dataset="Gestures study" \
    -F name="Wave" \
    -F metadata='{"gesture":"wave"}' \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected]

curl -X POST -H "x-api-key: ei_YOUR_API_KEY" \
    https://studio.edgeimpulse.com/v1/api/organizations/ORGANIZATION_ID/data/add \
    -F bucketName="Internal datasets" \
    -F dataset="Gestures study" \
    -F name="Idle" \
    -F metadata='{"gesture":"idle"}' \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] \
    -F files[][email protected] -F files[][email protected] -F files[][email protected]

curl -X POST -H "x-api-key: ei_YOUR_API_KEY" \
    https://studio.edgeimpulse.com/v1/api/organizations/ORGANIZATION_ID/data/add \
    -F bucketName="Internal datasets" \
    -F dataset="Gestures study" \
    -F name="Snake" \
    -F metadata='{"gesture":"snake"}' \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] 

curl -X POST -H "x-api-key: ei_YOUR_API_KEY" \
    https://studio.edgeimpulse.com/v1/api/organizations/ORGANIZATION_ID/data/add \
    -F bucketName="Internal datasets" \
    -F dataset="Gestures study" \
    -F name="Updown" \
    -F metadata='{"gesture":"updown"}' \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] \
    -F files[][email protected] -F files[][email protected] -F files[][email protected] -F files[][email protected] \
    -F files[][email protected] -F files[][email protected] -F files[][email protected]

The above script takes the data in the folder, groups it by gesture, and then creates 4 data items. The dataset ('Gestures study') is automatically created. If you go to the Data page in your organization you should now see 4 items with 70 files in total.

First dataset in Edge Impulse

4. Querying and downloading data

Organizational datasets contain a powerful query system which lets you explore and slice data. You control the query system through the 'Filter' text box, and you use a language which is very similar to SQL (documentation). For example, here are some queries that you can make:

  • dataset = 'Gestures study' - returns all items and files from the study.
  • len(name) = 4 - returns all data items whose name has 4 characters (in this case: idle and wave).
  • bucket_name = 'Internal datasets' AND name IN ('Updown', 'Snake') - returns data whose name is either 'Updown' or 'Snake, and that is stored in the 'Internal datasets' bucket.
  • metadata->gesture = 'updown' - return data that have a metadata field 'gesture' which contains 'updown'.
  • created > DATE('2020-03-01') - returns all data that was created after March 1, 2020.

After you've created a filter, you can select one or more data items, and select Download selected to create a ZIP file with the data files. The file count reflects the number of files returned by the filter.

Downloading files from organizational datasets.

The previous queries all returned all files for a data item. But you can also query files through the same filter. In that case the data item will be returned, but only with the files selected. For example:

  • file_name LIKE '%.0.cbor' - returns all files that end with .0.cbor.
  • file_size BETWEEN 8000 AND 10000 - returns all files that are between 8,000 and 10,000 bytes.
  • name = 'Updown' ORDER BY file_size ASC LIMIT 5 - returns the 5 smallest files for 'Updown'.

Selecting only a subset of files through advanced filters.

If you have an interesting query that you'd like to share with your colleagues, you can just share the URL. The query is already added to it automatically.

4.1 All available fields

These are all the available fields in the query interface:

  • dataset - Dataset.
  • bucket_id - Bucket ID.
  • bucket_name - Bucket name.
  • bucket_path - Path of the data item within the bucket.
  • id - Data item ID.
  • name - Data item name.
  • total_file_count - Number of files for the data item.
  • total_file_size - Total size of all files for the data item.
  • created - When the data item was created.
  • metadata->key - Any item listed under 'metadata'.
  • file_name - Name of a file.
  • file_size - Size of a file.

5. Importing data in an Edge Impulse project

If you have an interesting subset of data, and want to train a machine learning on this data, you can export the data into a new Edge Impulse project. This will make a copy of the data, that you can then manipulate and explore like any other project, or share with outside researchers without any risk of leaking the rest of your dataset. Data is also stripped of any metadata, like the name of the data item, or any metadata that you attached to the files.

🚧

Edge Impulse data acquisition format

This section only applies if your data is already in either the Edge Impulse Data acquisition format (CBOR and JSON both work), or in WAV format. For other data you'll need to use a transformation block before being able to create a new project.

Let's put this in practice. You need to select some data for the new project. Go to the Data page and set the filter to:

dataset = 'Gestures study'

Then, select all items and click Transform selected (70 files)

Create a filter with the data files that interest you, and select 'Create project from selected'.

This redirects you to the 'Transformation job' page. Under 'Import data into', select 'Project'. Under 'Project' select '+ Create new project', and enter a name. Next, select the category. This determines whether this is 'training' or 'testing' data, or that the data should be split up between these two categories. For now, select 'Split'. Then, click Create project to import the data.

Importing data from an organization into an Edge Impulse project.

This pulls down the gesture data from the bucket, and then imports it into the project. You don't need to stay on the page, the job will continue running in the background.

Seeing the progress on an import job.

If you now go back to your project you have a copy of the organizational dataset to your disposal, ready to build your next machine learning model. You can also add colleagues or outside collaborators to this specific project by going to Dashboard, and selecting the "Collaborators" widget. And if you want to do another experiment with the same data, you can easily create a new project with the same flow without any fear of changing any of the source data. 🚀

Any questions, or interested in the enterprise version of Edge Impulse? Contact us for more information.

Updated 2 months ago

Building your first dataset


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.