This is a prebuilt dataset for a keyword spotting system based on a subset of data in the Google Speech Commands Dataset. It contains 28 minutes of data sampled at 16,000Hz over three classes:
- Yes - one second samples with only the word "yes" in it.
- No - one second samples with only the word "no" in it.
- Noise - longer samples (up to a minute) with background or static noise.
We currently do not have a good keyword spotting tutorial in Edge Impulse, mostly as we're missing a good anomaly detection block for it, so use this dataset for research purposes only. If you want to build an audio model it's easier to follow the Audio classification tutorial.
- Download the keywords dataset.
- Unzip the file in a location of your choice.
- Open a terminal or command prompt, and navigate to the place where you extracted the file.
$ edge-impulse-uploader --clean $ edge-impulse-uploader --label no --category training keywords/training/no/*.json $ edge-impulse-uploader --label yes --category training keywords/training/yes/*.json $ edge-impulse-uploader --label noise --category training keywords/training/noise/*.json $ edge-impulse-uploader --label no --category testing keywords/testing/no/*.json $ edge-impulse-uploader --label yes --category testing keywords/testing/yes/*.json $ edge-impulse-uploader --label anomaly --category testing keywords/testing/anomaly/*.json
You will be prompted for your username, password, and the project where you want to add the dataset.
Updated about a month ago