https://github.com/rhasspy/piper
A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4. Piper is used in a variety of projects.
echo 'Welcome to the world of speech synthesis!' | \
./piper --model en_US-lessac-medium.onnx --output_file welcome.wav
Voices
Our goal is to support Home Assistant and the Year of Voice.
Download voices for the supported languages:
- Arabic (ar_JO)
- Catalan (ca_ES)
- Czech (cs_CZ)
- Danish (da_DK)
- German (de_DE)
- Greek (el_GR)
- English (en_GB, en_US)
- Spanish (es_ES, es_MX)
- Finnish (fi_FI)
- French (fr_FR)
- Hungarian (hu_HU)
- Icelandic (is_IS)
- Italian (it_IT)
- Georgian (ka_GE)
- Kazakh (kk_KZ)
- Luxembourgish (lb_LU)
- Nepali (ne_NP)
- Dutch (nl_BE, nl_NL)
- Norwegian (no_NO)
- Polish (pl_PL)
- Portuguese (pt_BR, pt_PT)
- Romanian (ro_RO)
- Russian (ru_RU)
- Serbian (sr_RS)
- Swedish (sv_SE)
- Swahili (sw_CD)
- Turkish (tr_TR)
- Ukrainian (uk_UA)
- Vietnamese (vi_VN)
- Chinese (zh_CN)
You will need two files per voice:
- A
.onnx
model file, such asen_US-lessac-medium.onnx
- A
.onnx.json
config file, such asen_US-lessac-medium.onnx.json
The MODEL_CARD
file for each voice contains important licensing information. Piper is intended for text to speech research, and does not impose any additional restrictions on voice models. Some voices may have restrictive licenses, however, so please review them carefully!
Installation
You can run Piper with Python or download a binary release:
If you want to build from source, see the Makefile and C++ source. You must download and extract piper-phonemize to lib/Linux-$(uname -m)/piper_phonemize
before building. For example, lib/Linux-x86_64/piper_phonemize/lib/libpiper_phonemize.so
should exist for AMD/Intel machines (as well as everything else from libpiper_phonemize-amd64.tar.gz
).
Usage
- Download a voice and extract the
.onnx
and.onnx.json
files - Run the
piper
binary with text on standard input,--model /path/to/your-voice.onnx
, and--output_file output.wav
For example:
echo 'Welcome to the world of speech synthesis!' | \ ./piper --model en_US-lessac-medium.onnx --output_file welcome.wav
For multi-speaker models, use --speaker <number>
to change speakers (default: 0).
See piper --help
for more options.
Streaming Audio
Piper can stream raw audio to stdout as its produced:
echo 'This sentence is spoken first. This sentence is synthesized while the first sentence is spoken.' | \ ./piper --model en_US-lessac-medium.onnx --output-raw | \ aplay -r 22050 -f S16_LE -t raw -
This is raw audio and not a WAV file, so make sure your audio player is set to play 16-bit mono PCM samples at the correct sample rate for the voice.
JSON Input
The piper
executable can accept JSON input when using the --json-input
flag. Each line of input must be a JSON object with text
field. For example:
{ "text": "First sentence to speak." } { "text": "Second sentence to speak." }
Optional fields include:
speaker
– string- Name of the speaker to use from
speaker_id_map
in config (multi-speaker voices only)
- Name of the speaker to use from
speaker_id
– number- Id of speaker to use from 0 to number of speakers – 1 (multi-speaker voices only, overrides “speaker”)
output_file
– string- Path to output WAV file
The following example writes two sentences with different speakers to different files:
{ "text": "First speaker.", "speaker_id": 0, "output_file": "/tmp/speaker_0.wav" } { "text": "Second speaker.", "speaker_id": 1, "output_file": "/tmp/speaker_1.wav" }
What do you think?
Show comments / Leave a comment