This is a prebuilt dataset for a keyword spotting system based on a subset of data in the Google Speech Commands Dataset, with added noise from the Microsoft Scalable Noisy Speech Dataset. It contains 25 minutes of data per class, split up in 1 second windows, sampled at 16,000Hz. The dataset contains:
- Yes - one second samples with only the word "yes" in it.
- No - one second samples with only the word "no" in it.
- Unknown - one second samples of other words.
- Noise - one second samples of background or static noise.
- 2.Unzip the file in a location of your choice.
Importing using the studio
Go to Data acquisition and click on the 'Upload icon'. Follow the instructions on the screen.
Or, import with the CLI
Open a terminal or command prompt, and navigate to the place where you extracted the file. Then run:
$ edge-impulse-uploader --clean
$ edge-impulse-uploader --label noise --category split noise/*.wav
$ edge-impulse-uploader --label unknown --category split unknown/*.wav
$ edge-impulse-uploader --label no --category split no/*.wav
$ edge-impulse-uploader --label yes --category split yes/*.wav