The Spectrogram processing block extracts time and frequency features from a signal. It performs well on audio data for non-voice recognition use cases, or on any sensor data with continuous frequencies.
GitHub repository containing all DSP block code: edgeimpulse/processing-blocks.
Spectrogram
Frame length: The length of each frame in seconds
Frame stride: The step between successive frame in seconds
Frequency bands: The FFT size
Normalization
Noise floor (dB): signal lower than this level will be dropped
It first divides the window in multiple overlapping frames. The size and number of frames can be adjusted with the parameters Frame length and Frame stride. For example with a window of 1 second, frame length of 0.02s and stride of 0.01s, it will create 99 time frames.
Each time frame is then divided in frequency bins using an FFT (Fast Fourier Transform) and we compute its power spectrum. The number of frequency bins equals to the Frequency bands parameter divided by 2 plus 1. We recommend keeping the Frequency bands (a.k.a. FFT size) value as a power of 2 for performances purpose. Finally the Noise floor value is applied to the power spectrum.
The features generated by the Spectrogram block are equal to the number of generated time frames times the number of frequency bins.
Frequency bands and frame length
There is a correlation between the Frequency bands (FFT size) parameter and the frame length. The frame length will be cropped or padded to the Frequency bands value while applying the FFT. For example, with a 8kHz sampling frequency and a time frame of 0.02s, each time frame contains 160 samples (8k * 0.02). If your FFT size is set 128, time frames will be cropped to 128 samples. If your FFT size is set to 256, time frames will be padded with zeros.