This is a prebuilt dataset for a keyword spotting system based on a subset of data in the Google Speech Commands Dataset, with added noise from the Microsoft Scalable Noisy Speech Dataset. It contains 25 minutes of data per class, split up in 1 second windows, sampled at 16,000Hz. The dataset contains:
Yes - one second samples with only the word "yes" in it.
No - one second samples with only the word "no" in it.
Unknown - one second samples of other words.
Noise - one second samples of background or static noise.
You can import this dataset to your Edge Impulse project through the Uploader in the studio, or via the CLI (docs). First:
Download the keywords dataset.
Unzip the file in a location of your choice.
Then:
Importing using the studio
Go to Data acquisition and click on the 'Upload icon'. Follow the instructions on the screen.
Or, import with the CLI
Open a terminal or command prompt, and navigate to the place where you extracted the file. Then run: