Bring your own model

Want to use a novel ML architecture, or load your own transfer learning models into Edge Impulse? Bring your own model! It's easy to bring in any training pipeline into the Studio, as long as you can output TFLite files. We have end-to-end examples of doing this in Keras, PyTorch and scikit-learn.

If you just want to modify the neural network architecture or loss function, you can also use expert mode directly in the Studio, without having to bring your own model. Go to any ML block, select three dots, and select Switch to Keras (expert) mode.

This page describes the input and output formats if you want to bring your own model, but a good way to start building a custom learning block is by modifying one of the following example repositories:

  • YOLOv5 - wraps the mainline YOLOv5 repository (trained with PyTorch) to train a custom transfer learning model.

  • EfficientNet - a Keras implementation of transfer learning with EfficientNet B0.

  • ResNet50 - a Keras implementation of transfer learning ResNet50.

  • Keras - a basic multi-layer perceptron in Keras and TensorFlow.

  • PyTorch - a basic multi-layer perceptron in PyTorch.

  • Scikit-learn - trains a logistic regression model using scikit-learn, then outputs a TFLite file for inferencing using jax.

Editing built-in blocks

Any built-in block in the Edge Impulse Studio (e.g. classifiers, regression models or FOMO blocks) can be edited locally, and then pushed back as a custom block. This is great if you want to make heavy modifications to these training pipelines, for example to do custom data augmentation. To download a block, go to any ML block in your project, click the three dots, select Edit block locally, and follow the instructions in the README.

Dockerfiles

Training pipelines in Edge Impulse are built on top of Docker containers, a virtualization technique which lets developers package up an application with all dependencies in a single package. To train your own model you'll need to wrap all the required packages, your scripts, and (if you use transfer learning) your pre-trained weights into this container. When running in Edge Impulse the container does not have network access, so make sure you don't download dependencies while running (fine when building the container).

A typical Dockerfile might look like (see the example repositories for more information):

# syntax = docker/dockerfile:experimental
FROM ubuntu:20.04
WORKDIR /app

ARG DEBIAN_FRONTEND=noninteractive

# Install base packages (like Python and pip)
RUN apt update && apt install -y curl zip git lsb-release software-properties-common apt-transport-https vim wget python3 python3-pip
RUN python3 -m pip install --upgrade pip==20.3.4

# Copy Python requirements in and install them
COPY requirements.txt ./
RUN pip3 install -r requirements.txt

# Copy the rest of your training scripts in
COPY . ./

# And tell us where to run the pipeline
ENTRYPOINT ["python3", "-u", "train.py"]

Important: ENTRYPOINT

It's important to create an ENTRYPOINT at the end of the Dockerfile to specify which file to run.

GPU Support

If you want to have GPU support (only for enterprise customers), you'll need cuda packages installed. If you export a learn block from the Studio these will already have the right base packages, so use that Dockerfile as a starting point.

Arguments to your script

The entrypoint (see above in the Dockerfile) will be called with these four parameters:

  • --data-directory - where you can find the data (see below for the input/output formats).

  • --epochs - number of epochs to train for (set by the user in the UI).

  • --learning-rate - learning rate to train with (set by the user in the UI).

  • --out-directory - where to write the TFLite files (see below for the input/output formats).

We realise that not every ML model requires setting epochs and learning rate, and we also realise that you might want to add extra options to the UI. Longer term we'll implement a parameter system similar to what custom processing blocks use.

Input format

The data directory contains your dataset, after running any DSP blocks, and already split in a train/validation set:

  • X_split_train.npy

  • Y_split_train.npy

  • X_split_test.npy

  • Y_split_train.npy

The X_*.npy files are float32 Numpy arrays, already in the right shape (e.g. if you're training on 96x96 RGB images this will be of shape (n, 96, 96, 3)). You can typically load these without any modification into your training pipeline.

The Y_*.npy files are either:

  1. int32 Numpy arrays, with four columns (label_index, sample_id, sample_slice_start_ms, sample_slice_end_ms).

  2. A JSON array in the form of: [{ "sampleId": 234731, "boundingBoxes": [{ "label": 1, "x": 260, "y": 313, "w": 234, "h": 261 }] } ]

2) is sent if your dataset has bounding boxes, in all other cases 1) is sent.

To get new data for your project, just run (requires Edge Impulse CLI v1.16 or higher):

edge-impulse-blocks runner --download-data data/

This regenerates features (if necessary) and then downloads the updated dataset.

Note on required shape for image models (NCHW vs. NHWC)

Image models require the input shape to be (n, Height, Width, Channels (NHWC). PyTorch uses (n, Channels, Height, Width) (NCHW) internally, and thus this needs to be fixed when you output the TFLite file. Unfortunately this is not as easy as just reshaping the input, as the model is trained on NCHW format already and you need to transpose some of the weights as well. Ultralytics YOLOv5 fixes this in their tf.py file, but we have not seen a good generic script which can do this. If you have ideas on making this easier, please let us know on the forum.

Output format

The training pipeline can output either of these three files:

  • model.tflite - a TFLite file with float32 inputs and outputs.

  • model_quantized_int8_io.tflite - a quantized TFLite file with int8 inputs and outputs.

  • saved_model.zip - a TensorFlow saved model (optional).

At least one of the TFLite files is required.

If you have a training pipeline that cannot output TFLite files by default (e.g. scikit-learn), you can use jax to implement the inference function; and compile that to TFLite. See our example repository. If there's any TFLite ops in your final model that are not supported by the EON Compiler (so you cannot run on device), then please let us know on the forums.

Hosting your custom block

Host your block directly within Edge Impulse with the Edge Impulse CLI:

$ edge-impulse-blocks init
$ edge-impulse-blocks push

To edit the block, go to:

  • Enterprise: go to your organization, Custom blocks > Machine learning.

  • Developers: click on your photo on the top right corner, select Custom blocks > Machine learning.

The block is now available from inside any of your Edge Impulse projects. Depending on the data your block operates on, you can add it via:

  • Object Detection: Create impulse > Add learning block > Object Detection (Images), then select the block via 'Choose a different model' on the 'Object detection' page.

  • Image classification: Create impulse > Add learning block > Transfer learning (Images), then select the block via 'Choose a different model' on the 'Transfer learning' page.

  • Audio classification: Create impulse > Add learning block > Transfer Learning (Keyword Spotting), then select the block via 'Choose a different model' on the 'Transfer learning' page.

  • Other (classification): Create impulse > Add learning block > Custom classification, then select the block via 'Choose a different model' on the 'Machine learning' page.

  • Other (regression): Create impulse > Add learning block > Custom regression, then select the block via 'Choose a different model' on the 'Regression' page.

Object detection output layers

Unfortunately object detection models typically don't have a standard way to go from neural network output layer to bounding boxes. Currently we support the following types of output layers:

  • MobileNet SSD

  • Edge Impulse FOMO

  • YOLOv5

If you have an object detection model with a different output layer then please contact your user success engineer (enterprise) or let us know on the forums (free users) with an example on how to interpret the output, and we can add it.

Getting latency/memory information

When training locally you can use the profiling API to get latency, RAM and ROM estimates. This is very useful as you can immediately see whether your model will fit on device. Additionally, you can use this API as part your experiment tracking (f.e. in Weights & Biases or MLFlow) to wield out models that won't fit your latency or memory constraints.

The profiling API expects:

  • A TFLite file.

  • A reference device (for latency calculation) - you can get a list of all devices via getProjectInfo in the latencyDevices object.

  • A reference model (which model is closest to your architecture) - you can choose between gestures-large-f32, gestures-large-i8, image-32-32-mobilenet-f32, image-32-32-mobilenet-i8, image-96-96-mobilenet-f32, image-96-96-mobilenet-i8, image-320-320-mobilenet-ssd-f32, keywords-2d-f32, keywords-2d-i8. Make sure to use i8 models if you have quantized your model.

Here's how you invoke the API from Python:

import requests, json, time, base64

PROJECT_ID = 1 # YOUR PROJECT ID
API_KEY = "ei_..." # YOUR API KEY
DEVICE = 'infineon-cy8ckit-062s2' # reference device
REFERENCE_MODEL = 'keywords-2d-i8' # reference model

def profile_tflite_model(tflite_file_path):
    url = f"https://studio.edgeimpulse.com/v1/api/{PROJECT_ID}/jobs/profile-tflite"

    base64_encoded_file = ''
    with open(tflite_file_path, "rb") as f:
        base64_encoded_file = base64.b64encode(f.read()).decode('utf-8')

    payload = {
        'tfliteFileBase64': base64_encoded_file,
        'device': DEVICE,
        'referenceModel': REFERENCE_MODEL,
    }
    headers = {
        "x-api-key": API_KEY,
        "Accept": "application/json",
        "Content-Type": "application/json",
    }
    response = requests.request("POST", url, json=payload, headers=headers)
    body = json.loads(response.text)
    if (not body['success']):
        raise Exception(body['error'])
    return body['id']

def get_stdout(job_id, skip_line_no):
    url = f"https://studio.edgeimpulse.com/v1/api/{PROJECT_ID}/jobs/{job_id}/stdout"
    headers = {
        "x-api-key": API_KEY,
        "Accept": "application/json",
        "Content-Type": "application/json",
    }
    response = requests.request("GET", url, headers=headers)
    body = json.loads(response.text)
    if (not body['success']):
        raise Exception(body['error'])
    stdout = body['stdout'][::-1] # reverse array so it's old -> new
    return [ x['data'] for x in stdout[skip_line_no:] ]

def wait_for_job_completion(job_id):
    skip_line_no = 0

    url = f"https://studio.edgeimpulse.com/v1/api/{PROJECT_ID}/jobs/{job_id}/status"
    headers = {
        "x-api-key": API_KEY,
        "Accept": "application/json",
        "Content-Type": "application/json",
    }
    while True:
        response = requests.request("GET", url, headers=headers)
        body = json.loads(response.text)
        if (not body['success']):
            raise Exception(body['error'])

        stdout = get_stdout(job_id, skip_line_no)
        for l in stdout:
            print(l, end='')
        skip_line_no = skip_line_no + len(stdout)

        if (not 'finished' in body['job']):
            # print('Job', job_id, 'is not finished yet...', body['job'])
            time.sleep(1)
            continue
        if (not body['job']['finishedSuccessful']):
            raise Exception('Job failed')
        else:
            break

def get_perf_results(job_id):
    url = f"https://studio.edgeimpulse.com/v1/api/{PROJECT_ID}/jobs/profile-tflite/{job_id}/result"
    headers = {
        "x-api-key": API_KEY,
        "Accept": "application/json",
        "Content-Type": "application/json",
    }
    response = requests.request("POST", url, headers=headers)
    body = json.loads(response.text)
    if (not body['success']):
        raise Exception(body['error'])
    return body

if __name__ == "__main__":
    job_id = profile_tflite_model('model-tensorflow-lite-int8-quantized-model.lite')
    print('Job ID is', job_id)
    wait_for_job_completion(job_id)
    print('Job', job_id, 'is finished')
    perf_data = get_perf_results(job_id)
    print('Memory usage', perf_data['memory'])
    print('Time per inference (' + DEVICE + ')', perf_data['timePerInferenceMs'])

Last updated

Revision created on 11/14/2022