Wav2vec 2.0: Learning the structure of speech from raw audioA pre-trained version of Wav2vec 2.0 is available through the 🤗 Transformers library. The pre-trained model supports both PyTorch and TensorFlow libraries. We will use it with PyTorch.
&#xNAN;https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/
wav2vec2-base-960h
)/score
endpoint. This endpoint can be accessed using a simple HTTP call.
The scoring file for our voice-to-text application can be found in the scoring-func/score_audio.py
file.
We can upload this to Azure ML from the Models page:
scoring-func
foldertransformers
, we need a Custom Environment. This can be created from the Environments page. We can choose to start from a Curated Environment, or we can use our own Dockerfile
. After multiple tries, I ended up creating a Custom Environment based on the mcr.microsoft.com/azureml/pytorch-1.10-ubuntu18.04-py37-cpu-inference
image.
This is PyTorch based image, supporting only CPU inference. It also has Python 3.7 and the transformers
library installed.
data
field. To call it with a real audio file we can use a client side Python script like this:
kubectl
CLI tool we can also see what resource Azure ML deployed in our Kubernetes Cluster: What is Azure Machine Learning CLI & Python SDK v2?The Azure ML CLI and Python SDK enable engineers the use MLOps techniques. Similar to DevOps, MLOps is a set of practices that allows the reliable and efficient management of AI / ML application lifecycle. It enables processes like:
&#xNAN;https://docs.microsoft.com/en-us/azure/machine-learning/concept-v2
Setting up your Raspberry PiNext, there are couple steps to be done in order to connect the device to EdgeImpulse. The goal is to install the
&#xNAN;https://projects.raspberrypi.org/en/projects/raspberry-pi-setting-up/0
edge-impulse-linux
utility, which can be done as follows:
edge-impulse-linux
on the Raspberry Pi:
Responding to your voice https://docs.edgeimpulse.com/docs/tutorials/responding-to-your-voiceThe first step in training a keyword spotting model is to collect a set of samples of the word we want to detect. This can be done in the Data Acquisition tab:
Listen!
samples we have):listen
, noise
, unknown
) are evenly distributed:
listen
) are clearly separated from the unknown
and noise
samples.
Our Impulse at this point is ready to be used. We can try it out in the Live classification tab.
edge-impulse-linux-runner
app.
The edge-impulse-linux-runner
tool automatically downloads and optimizes the model for the Raspberry Pi. Then it runs a sample app that continuously analyses the input audio, and gives the probabilities of the predicted classes:
examples/audio/classify.py
file. We can launch it as follows:
EdgeML
/ edgeml.py
- responsible for running the keyword spotting model, until a given keyword is detectedAudio
/ audio.py
- contains the audio recording functionality, with silence detectionCloudML
/ cloudml.py
- responsible for talking to the Cloud ML endpointmain.py
- the entry point of the app, with a control loop linking the above parts togetheredgeml/python-app/
folder.