Comment on page

# Anomaly detection (GMM)

Neural networks are powerful but have a major drawback: handling unseen data, like new gestures, is a challenge due to their reliance on existing training data. Even entirely novel inputs often get misclassified into existing categories. Gaussian Mixture Models (GMMs) are clustering techniques that we can use for anomaly detection. GMMs can perform well with datasets that would otherwise perform poorly with other anomaly detection algorithms (like K-means).

**Only available for enterprise customers**

Organizational features are only available for enterprise customers. View our pricing for more information.

**Gaussian Mixture Model (GMM)**

A Gaussian Mixture Model represents a probability distribution as a mixture of multiple Gaussian (normal) distributions. Each Gaussian component in the mixture represents a cluster of data points with similar characteristics. Thus, GMMs work using the assumption that the samples within a dataset can be modeled using different Gaussian distributions.

Anomaly detection using GMM involves identifying data points with low probabilities. If a data point has a significantly lower probability of being generated by the mixture model compared to most other data points, it is considered an anomaly (this will output of a high anomaly score).

GMM has some overlap with K-means, however, K-means clusters are always circular, spherical or hyperspherical when GMM can model elliptical clusters.

In most of our DSP blocks, you have the option to calculate the

**feature importance**. Edge Impulse Studio will then output a Feature Importance list that will help you determine which axes generated from your DSP block are most significant to analyze when you want to do anomaly detection.The GMM anomaly detection learning block has two adjustable parameters: the

**Number of components**and**The axes**.The number of (gaussian) components can be interpreted as the number of clusters in Gaussian Mixture Models.

**How to choose the number of components?**

When increasing the number of (Gaussian) components, the model will fit the original distribution more closely. If the value is too high, there is a risk of overfitting.

If you have prior knowledge about the problem or the data, it can provide valuable insights into the appropriate number of components. For example, if you know that there are three distinct groups in your data, you may start by trying a GMM with three components. Visualizing the data can also provide hints about the number of clusters. If you can distinguish several visible clusters from your training dataset, try to set the number of components as the number of visible clusters

The different axes correspond to the generated features from the pre-processing block. The chosen axes will use the features as the input data for the training.

Click on

**Start training**to trigger the learning process. Once trained you will obtain a view that looks like the following:GMM learning block trained

*Note: By definition, there should not be any anomalies in the training dataset, and thus accuracy is not calculated during training. Run Model testing to learn more about the model performance. Additionally, you can also select a test data sample in the*

*Anomaly Explorer**directly on this page.*

Testing single sample

Model testing view

**Limitation**

Make sure to label your samples exactly as

`anomaly`

or `no anomaly`

in your test dataset so they can be used in the F1 score calculation. We are working on making this more flexible.In the example above, you will see that some samples are considered as

`no anomaly`

while the expected output is an `anomaly`

. If you take a closer look at the anomaly score for `non anomaly`

samples, the range values are below `1.00`

:Model testing view

To fix this, you can set the

**Confidence thresholds**Setting confidence threshold

In this project, we have set the confidence threshold to

`1.00`

. This gives results closer to our expectations:Model testing view

Keep in mind that every project is different, please make sure to also validate your results in real conditions.

- 1.During training, X number of Gaussian probability distributions are learned from the data where X is the number of components (or clusters) defined in the learning block page. Samples are assigned to one of the distributions based on the probability that it belongs to each. We use Sklearn under the hood and the anomaly score corresponds to the
`log-likelihood`

. - 2.For the inference, we calculate the probability (which can be interpreted as a distance on a graph) for a new data point belonging to one of the populations in the training data. If the data point belongs to a cluster, the anomaly score will be low.

**Interesting readings**:

**Public Projects**:

Last modified 2mo ago