
Project Overview

Machine Learning and Data Collection
Most machine learning applications benefit from using training data collected from the system on which you will perform inferencing. In the case of embedded keyword spotting, things are no different. Subtle variations between device microphones and noise signatures can render a model trained on one system useless on another. Therefore, we want to collect audio data for training using the very circuit that we’ll use to perform inferencing. The Raspberry Pi Pico can actually collect data at an extremely high sample rate. In my testing, you can achieve sample rates of up to 500 kHz. Pretty neat! For audio, of course, a much lower sample rate is sufficient. Lossless audio is sampled at up to 44 kHz (twice the maximum frequency that human ears can hear thanks to the Nyquist sampling theorem), but for keyword spotting we can get away with something as low as 4 kHz. Lower sample rates are generally better because they decrease the computational burden on our inferencing platform; if we go too low, we start to sacrifice audio fidelity and spotting keywords becomes impossible.Getting Data to Your Computer

Audio data in base-64 format
The Circuit

Pico Code
You can find all the code for this project here. It compiles using the standard PicoCMake
and make
procedure. If you’re not sure how this process works, look through Raspberry Pi’s tutorials for getting set up with the Pico C++ SDK and building/flashing your code.
The main code is in pico_daq.cpp
. The program starts the Pico’s ADC sampling routine, normalizes the data and converts it to floats, converts the float data to base-64, and then prints this out over the serial console to the host computer.
This is where we encounter our first problem — at 4 kHz, with four bytes per floating point value, the serial port cannot write data as fast as we collect it. The way I handle this problem in this code is by dropping chunks of samples. While we’re busy sending the rest of the data that could not be sent during the sampling window, the ADC is not collecting data. This leads to some jumps within the final audio file, but for the purposes of collecting data for keyword spotting it is sufficient.
Compile this code and flash it to your Pico.
Saving Off Serial Data
On Linux and macOS systems, saving serial data is relatively easy with thescreen
utility. You can install screen
using your package manager in Linux, or using homebrew on macOS. Most Windows serial console clients also give you some means of saving off data from the serial console, but I won’t provide instructions for these.
Before using screen
, you’ll first need to identify the device name of your Pico. On macOS, this will be something like /dev/tty.usbmodem....
. On Linux, this will be something like /dev/ttyACM...
. You can use lsusb
and dmesg
to help you figure out what device handle your Pico is.
With the code running on your Pico, the following command will open the Serial port and save off the data (make sure you replace /dev/...
with your Pico’s device handle): screen -L /dev/tty.usbmodem1301
Go ahead and start speaking one of your keywords for a period of time. Base-64 characters should constantly flash across your screen. Follow best practices listed in the Edge Impulse tutorial for creating datasets for keyword spotting as you collect your data.
To exit screen
when you’re done collecting data, type Control-A
and then press k
. You will now have the raw base-64 data saved in a file called screenlog.0
.
You can rename this file to something that indicates what it actually contains (“.raw” is a good choice), and keep collecting others.
Converting Base-64 Data to a .wav File
Edge Impulse doesn’t know how to understand the data in base-64 format. What we need is a way to convert this base-64 data into a .wav file that you can import directly into Edge Impulse using its data uploader. I made a Python 3 program to do just that, which you can find here. This code requires thescipy
, numpy
, and pydub
packages, which you can install via pip
.
Replace infile
with the path to the base-64 data output by screen
, and outfile
with the desired output path of your .wav file. You should be ready to run the Python program!
Once the program is done, you’ll have a finished audio file. You can play this with any media player. Give it a try — if all goes well, you should be able to hear your voice!