Local Software Requirements
- Python 3
- Pip package manager
- Jupyter Notebook: https://jupyter.org/install
- pip packages (install with
pip install
packagename
):- pydub https://pypi.org/project/pydub/
- google-cloud-texttospeech https://cloud.google.com/python/docs/reference/texttospeech/latest
- requests https://pypi.org/project/requests/
Set up Google TTS API
First off you will need to set up and Edge Impulse account and create your first project. You will also need a Google Cloud account with the Text to Speech API enabled: https://cloud.google.com/text-to-speech, the first million characters generated each month are free (WaveNet voices), this should be plenty for most cases as you’ll only need to generate your dataset once. From google you will need to download a credentials JSON file and set it to the correct environment variable on your system to allow the python API to work: (https://developers.google.com/workspace/guides/create-credentials#service-account)Generate the desired samples
First off we need to set our desired keywords and labels:- languages - Choose the text to speech voice languages to use (https://cloud.google.com/text-to-speech/docs/voices)
- pitches - Which voice pitches to apply
- genders - Which SSML genders to apply
- speakingRates - Which speaking speeds to apply
- out_length - How long each output sample should be
- count - Maximum number of samples to output (if all combinations of languages, pitches etc are higher then this restricts output)
- voice-dir - Where to store the clean samples before noise is added
- noise-url - Which noise file to download and apply to your samples
- output-folder - The final output location of the noised samples
- num-copies - How many different noisy versions of each sample to create
- max-noise-level - in Db,
num_copies
to be smaller than the number of combinations then these options will be reduced:
./out-noisy
can be uploaded easily using the Edge Impulse CLI tool: