Anomaly detection (K-means)

Neural networks are great, but they have one big flaw. They're terrible at dealing with data they have never seen before (like a new gesture). Neural networks cannot judge this, as they are only aware of the training data. If you give it something unlike anything it has seen before it'll still classify as one of the four classes.

Tutorial

Want to see the Anomaly Detection in action? Check out our Continuous Motion Recognition tutorial.

K-means clustering

This method looks at the data points in a dataset and groups those that are similar into a predefined number K of clusters. A threshold value can be added to detect anomalies: if the distance between a data point and its nearest centroid is greater than the threshold value, then it is an anomaly.

The main difficulty resides in choosing K, since data in a time series is always changing and different values of K might be ideal at different times. Besides, in more complex scenarios where there are both local and global outliers, many outliers might pass under the radar and be assigned to a cluster.

Features importance (optional)

In most of your DSP blocks, you have an option to calculate the feature importance. Edge Impulse Studio will then output a Feature Importance graphic that will help you determine which axes and values generated from your DSP block are most significant to analyze when you want to do anomaly detection.

This process of generating features and determining the most important features of your data will further reduce the amount of signal analysis needed on the device with new and unseen data.

Setting up the anomaly detection block

In your anomaly detection block, you can click on the Select suggested axes button to harness the value of the feature importance output.

Here is the process in the background:

  • Create X number of clusters and group all the data.

  • For each of these clusters we store the center and the size of the cluster.

  • During inference we calculate the closest cluster for a new data point, and show the distance from the edge of the cluster. If it’s within a cluster (no anomaly)you thus get a value below 0.

In the above picture, known clusters in are in blue, new classified data in orange. It's clearly outside of any known clusters and can thus be tagged as an anomaly.

Additional resources

Last updated

Revision created on 11/14/2022