Audio MFCC

The Audio MFCC blocks extracts coefficients from an audio signal. Similarly to the Audio MFE block, it uses a non-linear scale called Mel-scale. It is the reference block for speech recognition and can also performs well on some non-human voice use cases.

Cepstral coefficients of an example sentence "Hello World" (1-sec window)Cepstral coefficients of an example sentence "Hello World" (1-sec window)

Cepstral coefficients of an example sentence "Hello World" (1-sec window)

Audio MFCC parameters

Mel Frequency Cepstral Coefficients

  • Number of coefficients: Number of cepstral coefficients to keep after applying Discrete Cosine Transform
  • Frame length: The length of each frame in seconds
  • Frame stride: The step between successive frame in seconds
  • Filter number: The number of triangular filters applied to the spectrogram
  • FFT length: The FFT size
  • Low frequency: Lowest band edge of Mel-scale filterbanks
  • High frequency: Highest band edge of Mel-scale filterbanks
  • Window size: The size of sliding window for local cepstral mean normalization. Windows size must be odd.

Pre-emphasis

  • Coefficient: The pre-emphasizing coefficient to apply to the input signal (0 equals to no filtering)
  • Shift: The pre-emphasis shift value to roll over the input signal

How does the MFCC block work?

The features' extractions adds one extra step to the MFE block resulting in a compressed representation of the filterbanks. A Discrete Cosine Transform is applied on each filterbank to extract cepstral coefficients. 13 coefficients are usually retained, the rest are discarded as they represent fast changes not useful for speech recognition.


Did this page help you?