Data
Since the creation of Edge Impulse, we have been helping customers to deal with complex data pipelines, complex data transformation methods and complex clinical validation studies.
In most cases, before even thinking about machine learning algorithms, researchers need to build quality datasets from real-world data. These data come from various devices (prototype devices being developed vs clinical/industrial-grade reference devices), have different formats (excel sheets, images, csv, json, etc...), and be stored in various places (researchers' computers, Dropbox folders, Google Drive, S3 buckets, etc...).
Dealing with such complex data infrastructure is time-consuming and expensive to develop and maintain. With the organizational data, we want to give you tools to centralize, validate and transform datasets so they can be easily imported into your projects to train your machine learning models.
Only available with Edge Impulse Enterprise Plan
Try our FREE Enterprise Trial today.
Health reference design
We have built a health reference design that describes an end-to-end ML workflow for building a wearable health product using Edge Impulse.
In this reference resign, we want to help you understand how to create a full clinical data pipeline by:
Buckets
Before we get started, you must link your organization with one or several storage buckets. First, select where your data lives:
AWS S3 buckets
Google Cloud Storage
Any S3-compatible bucket
And fill the form with your bucket name, region, endpoint access and secret keys:
A green dot indicates that your bucket is connected:
Datasets
Two types of dataset structures can be used - Generic datasets (default) and Clinical datasets.
There is no required format for data files. You can upload data in any format, whether it's CSV, Parquet, or a proprietary data format.
However, to import data items to an Edge Impulse project, you will need to use the right format as our studio ingestion API only supports these formats:
JPG, PNG images
MP4, AVI video files
WAV audio files
CBOR/JSON files in the Edge Impulse data acquisition format
CSV files
Tip: You can use transformation blocks to convert your data
The default dataset structure is a file-based one, no matter the directory structure:
For example:
or:
Note that you will be able to associate the labels of your data items from the file name or the directory name when importing your data in a project.
Create a new dataset
Once you successfully linked your storage bucket to your organization, head to the Datasets tab and click on + Add new dataset:
Fill out the following form:
Click on Create dataset
Data
With your datasets imported, you can now navigate into your dataset, create folders, query your dataset, add data items and import your data to an Edge Impulse project.
Adding data to your project
Go to the Actions...->Import data into a project, select the project you wish to import to and click Next, Configure how to label this data:
This will import the data into the project and optionally create a new label for each file in the dataset. This labeling step helps you keep track of different classes or categories within your data.
After importing the data into the project, in the Next, post-sync actions step, you can configure a data pipeline to automatically retrieve and trigger actions in your project:
Previewing Data
We also have added a data preview feature, allowing you to visualize certain types of data directly within the organization data tab.
Supported data types include tables (CSV/Parquet), images, PDFs, audio files (WAV/MP3), and text files (TXT/JSON). This feature gives you a quick overview of your data and helps ensure its integrity and correctness.
Recap
If you need to get data into your organization, you can now do this in a few simple steps. To go further and use advanced features, query your datasets or transform your dataset, please have a look at the health reference design tutorial 🚀
Any questions, or interested in the enterprise version of Edge Impulse? Contact us for more information.
Troubleshooting
CORS Headers
If you see the following message, make sure to add the CORS header to your bucket settings:
You can also add the CORS using the AWS S3 CLI:
with this file cors.json
:
Last updated