Audio MFCC

The Audio MFCC blocks extracts coefficients from an audio signal. Similarly to the Audio MFE block, it uses a non-linear scale called Mel-scale. It is the reference block for speech recognition and can also performs well on some non-human voice use cases.

GitHub repository containing all DSP block code: edgeimpulse/processing-blocks.

Audio MFCC parameters

Mel Frequency Cepstral Coefficients

  • Number of coefficients: Number of cepstral coefficients to keep after applying Discrete Cosine Transform

  • Frame length: The length of each frame in seconds

  • Frame stride: The step between successive frame in seconds

  • Filter number: The number of triangular filters applied to the spectrogram

  • FFT length: The FFT size

  • Low frequency: Lowest band edge of Mel-scale filterbanks

  • High frequency: Highest band edge of Mel-scale filterbanks

  • Window size: The size of sliding window for local cepstral mean normalization. Windows size must be odd.

Pre-emphasis

  • Coefficient: The pre-emphasizing coefficient to apply to the input signal (0 equals to no filtering)

  • Note: Shift has been removed and set to 1 for all future projects. Older & existing projects can still change this value or use an existing value.

How does the MFCC block work?

The features' extractions adds one extra step to the MFE block resulting in a compressed representation of the filterbanks. A Discrete Cosine Transform is applied on each filterbank to extract cepstral coefficients. 13 coefficients are usually retained, the rest are discarded as they represent fast changes not useful for speech recognition.

Last updated

Revision created on 11/14/2022