Research data

Since the creation of Edge Impulse, we have been helping customers to deal with complex data pipelines, complex data transformation methods and complex clinical validation studies.

In most cases, before even thinking about machine learning algorithms, researchers need to build quality datasets from real-world data. These data come from various devices (prototype devices being developed vs clinical/industrial-grade reference devices), have different formats (excel sheets, images, csv, json, etc...), and be stored in various places (researcher computer, Dropbox folder, Google Drive, S3 buckets, etc...).

Dealing with such complex data infrastructure is time-consuming and expensive to develop and maintain. With this Research data section, we want to help you understand how to create a full research data pipeline by:

We have built a health reference design that describes an end-to-end ML workflow for building a wearable health product using Edge Impulse. It covers an activity study in a research lab, where data is recorded from the wearable end device (PPG + accelerometer), a reference device (Polar H10 HR monitor), plus labels (e.g. sitting, running, biking). The data is collected and validated, then written to a research dataset in an Edge Impulse organization, and finally imported into an Edge Impulse project where we train a classifier.

It handles data coming from multiple sources, data alignment, and a multi-stage pipeline before the data is imported into an Edge Impulse project. We won't cover in detail all the code snippets, our solution engineers can help you set this end-to-end ML workflow.

Last updated

Revision created on 11/14/2022