Piper : A fast, local neural text to speech system

Home
/
Cyber Security
/
Piper : A fast, local neural text to speech system

Cyber Security

Prapattimynk

45Views

Piper : A fast, local neural text to speech system

April 20, 20245 min read

https://github.com/rhasspy/piper

A fast, local neural text to speech system that sounds great and is optimized for the Raspberry Pi 4. Piper is used in a variety of projects.

echo 'Welcome to the world of speech synthesis!' | \
  ./piper --model en_US-lessac-medium.onnx --output_file welcome.wav

Voices

Our goal is to support Home Assistant and the Year of Voice.

Download voices for the supported languages:

Arabic (ar_JO)
Catalan (ca_ES)
Czech (cs_CZ)
Danish (da_DK)
German (de_DE)
Greek (el_GR)
English (en_GB, en_US)
Spanish (es_ES, es_MX)
Finnish (fi_FI)
French (fr_FR)
Hungarian (hu_HU)
Icelandic (is_IS)
Italian (it_IT)
Georgian (ka_GE)
Kazakh (kk_KZ)
Luxembourgish (lb_LU)
Nepali (ne_NP)
Dutch (nl_BE, nl_NL)
Norwegian (no_NO)
Polish (pl_PL)
Portuguese (pt_BR, pt_PT)
Romanian (ro_RO)
Russian (ru_RU)
Serbian (sr_RS)
Swedish (sv_SE)
Swahili (sw_CD)
Turkish (tr_TR)
Ukrainian (uk_UA)
Vietnamese (vi_VN)
Chinese (zh_CN)

You will need two files per voice:

A .onnx model file, such as en_US-lessac-medium.onnx
A .onnx.json config file, such as en_US-lessac-medium.onnx.json

The MODEL_CARD file for each voice contains important licensing information. Piper is intended for text to speech research, and does not impose any additional restrictions on voice models. Some voices may have restrictive licenses, however, so please review them carefully!

Installation

You can run Piper with Python or download a binary release:

amd64 (64-bit desktop Linux)
arm64 (64-bit Raspberry Pi 4)
armv7 (32-bit Raspberry Pi 3/4)

If you want to build from source, see the Makefile and C++ source. You must download and extract piper-phonemize to lib/Linux-$(uname -m)/piper_phonemize before building. For example, lib/Linux-x86_64/piper_phonemize/lib/libpiper_phonemize.so should exist for AMD/Intel machines (as well as everything else from libpiper_phonemize-amd64.tar.gz).

Usage

Download a voice and extract the .onnx and .onnx.json files
Run the piper binary with text on standard input, --model /path/to/your-voice.onnx, and --output_file output.wav

For example:

echo 'Welcome to the world of speech synthesis!' | \
  ./piper --model en_US-lessac-medium.onnx --output_file welcome.wav

For multi-speaker models, use --speaker <number> to change speakers (default: 0).

See piper --help for more options.

Streaming Audio

Piper can stream raw audio to stdout as its produced:

echo 'This sentence is spoken first. This sentence is synthesized while the first sentence is spoken.' | \
  ./piper --model en_US-lessac-medium.onnx --output-raw | \
  aplay -r 22050 -f S16_LE -t raw -

This is raw audio and not a WAV file, so make sure your audio player is set to play 16-bit mono PCM samples at the correct sample rate for the voice.

JSON Input

The piper executable can accept JSON input when using the --json-input flag. Each line of input must be a JSON object with text field. For example:

{ "text": "First sentence to speak." }
{ "text": "Second sentence to speak." }

Optional fields include:

speaker – string
- Name of the speaker to use from speaker_id_map in config (multi-speaker voices only)
speaker_id – number
- Id of speaker to use from 0 to number of speakers – 1 (multi-speaker voices only, overrides “speaker”)
output_file – string
- Path to output WAV file

The following example writes two sentences with different speakers to different files:

{ "text": "First speaker.", "speaker_id": 0, "output_file": "/tmp/speaker_0.wav" }
{ "text": "Second speaker.", "speaker_id": 1, "output_file": "/tmp/speaker_1.wav" }