eSpeak
Developer(s) | Jonathan Duddington |
---|---|
Initial release | February 2006 |
Stable release | 1.48.04 / April 6, 2014 |
Written in | C |
Operating system |
Linux Windows Mac OS X RISC OS FreeBSD Windows Mobile |
Size | ~1600 kbyte |
Type | Speech synthesizer |
License | GNU GPL v3+ |
Website | http://espeak.sourceforge.net/ |
eSpeak is a compact open source software speech synthesizer for Linux, Windows, and other platforms.[1] It uses a formant synthesis method, providing many languages in a small size. Much of the programming for eSpeak's languages was based on information found on Wikipedia, with some subsequent feedback from native speakers.[2] Projects using eSpeak include NVDA, Ubuntu and OLPC, and it has also been used by Google Translate.
History
eSpeak is derived from the "Speak" speech synthesizer for British English for Acorn RISC OS computers which was originally written in 1995 by Jonathan Duddington.
A rewritten version for Linux appeared in February 2006 and a Windows SAPI 5 version in January 2007. Subsequent development has added and improved support for additional languages.
Because of its small size and many languages, it is included as the default speech synthesizer in the NVDA open source screen reader for Windows, and on the Ubuntu and other Linux installation discs.
The quality of the language voices varies greatly. Some have had more work or feedback from native speakers than others. Most of the people who have helped to improve the various languages are blind users of text-to-speech.
Synthesis method
eSpeak provides two methods of synthesis: the original eSpeak synthesizer and a Klatt synthesizer.[3] In addition, eSpeak can be used as a front-end, providing text-to-phoneme translation and prosody, to MBROLA diphone voices.
The eSpeak and Klatt synthesizers use different types of formant synthesis.
The eSpeak synthesizer creates voiced speech sounds such as vowels and sonorant consonants by adding together sine waves to make the formant peaks. Unvoiced consonants such as /s/ are made by playing recorded sounds. Voiced consonants such as /z/ are made by mixing a synthesized voiced sound with a recorded unvoiced sound.
The Klatt synthesizer mostly uses the same formant data as the eSpeak synthesizer. It produces voiced sounds by starting with a waveform which is rich in harmonics (simulating the vibration of the vocal cords) and then applying digital filters in order to produce speech sounds.
Features
eSpeak can be used as a command-line program, or as a shared library.
It supports Speech Synthesis Markup Language (SSML).
Language voices are identified by the language's ISO 639-1 code. They can be modified by "voice variants". These are text files which can change characteristics such as pitch range, add effects such as echo, whisper and croaky voice, or make systematic adjustments to formant frequencies to change the sound of the voice. For example, "af" is the Afrikaans voice. "af+f2" is the Afrikaans voice modified with the "f2" voice variant which changes the formants and the pitch range to give a female sound.
eSpeak uses an ASCII representation of phoneme names which is loosely based on the Kirshenbaum system.
Phonetic representations can be included within text input by including them within double square-brackets. For example: espeak -v en "Hello [[w3:ld]]" will say "Hello world" in English.
See also
References
- ↑ http://espeak.sourceforge.net/download.html has Mac OS X and RISC OS binaries, and the source contains notes about compiling on DOS, generic Unix and Windows Mobile
- ↑ http://espeak.sourceforge.net/add_language.html
- ↑ Dennis H. Klatt (1979). "Software for a cascade/parallel formant synthesizer". J. Acoustical Society of America, 67(3) March 1980.
External links
- Official website
- SourceForge.net project page
- Tombuntu magazine article about eSpeak
- GUI for eSpeak
- Ruby API for eSpeak
- Lua API for eSpeak
|