Audio data in base-64 format
CMake
and make
procedure. If you’re not sure how this process works, look through Raspberry Pi’s tutorials for getting set up with the Pico C++ SDK and building/flashing your code.
The main code is in pico_daq.cpp
. The program starts the Pico’s ADC sampling routine, normalizes the data and converts it to floats, converts the float data to base-64, and then prints this out over the serial console to the host computer.
This is where we encounter our first problem — at 4 kHz, with four bytes per floating point value, the serial port cannot write data as fast as we collect it. The way I handle this problem in this code is by dropping chunks of samples. While we’re busy sending the rest of the data that could not be sent during the sampling window, the ADC is not collecting data. This leads to some jumps within the final audio file, but for the purposes of collecting data for keyword spotting it is sufficient.
Compile this code and flash it to your Pico.
screen
utility. You can install screen
using your package manager in Linux, or using homebrew on macOS. Most Windows serial console clients also give you some means of saving off data from the serial console, but I won’t provide instructions for these.
Before using screen
, you’ll first need to identify the device name of your Pico. On macOS, this will be something like /dev/tty.usbmodem....
. On Linux, this will be something like /dev/ttyACM...
. You can use lsusb
and dmesg
to help you figure out what device handle your Pico is.
With the code running on your Pico, the following command will open the Serial port and save off the data (make sure you replace /dev/...
with your Pico’s device handle): screen -L /dev/tty.usbmodem1301
Go ahead and start speaking one of your keywords for a period of time. Base-64 characters should constantly flash across your screen. Follow best practices listed in the Edge Impulse tutorial for creating datasets for keyword spotting as you collect your data.
To exit screen
when you’re done collecting data, type Control-A
and then press k
. You will now have the raw base-64 data saved in a file called screenlog.0
.
You can rename this file to something that indicates what it actually contains (“.raw” is a good choice), and keep collecting others.
scipy
, numpy
, and pydub
packages, which you can install via pip
.
Replace infile
with the path to the base-64 data output by screen
, and outfile
with the desired output path of your .wav file. You should be ready to run the Python program!
Once the program is done, you’ll have a finished audio file. You can play this with any media player. Give it a try — if all goes well, you should be able to hear your voice!