eSpeak

eSpeak
Developer(s)	Jonathan Duddington
Initial release	February 2006
Stable release	1.48.04 / April 6, 2014 (2014-04-06)
Written in	C
Operating system	Linux Windows Mac OS X RISC OS FreeBSD Windows Mobile
Size	~1600 kbyte
Type	Speech synthesizer
License	GNU GPL v3+
Website	http://espeak.sourceforge.net/

eSpeak is a compact open source software speech synthesizer for Linux, Windows, and other platforms.^[1] It uses a formant synthesis method, providing many languages in a small size. Much of the programming for eSpeak's languages was based on information found on Wikipedia, with some subsequent feedback from native speakers.^[2] Projects using eSpeak include NVDA, Ubuntu and OLPC, and it has also been used by Google Translate.

History

eSpeak is derived from the "Speak" speech synthesizer for British English for Acorn RISC OS computers which was originally written in 1995 by Jonathan Duddington.

A rewritten version for Linux appeared in February 2006 and a Windows SAPI 5 version in January 2007. Subsequent development has added and improved support for additional languages.

Because of infrequent updates for last few years several espeak forks had emerged on github.^[3] After discussions on espeak's discussion list,^[4]^[5] espeak-ng fork managed by Reece Dunn was decided as a new canonical place of espeak further development.

Because of its small size and many languages, it is included as the default speech synthesizer in the NVDA open source screen reader for Windows, and on the Ubuntu and other Linux installation discs.

The quality of the language voices varies greatly. Some have had more work or feedback from native speakers than others. Most of the people who have helped to improve the various languages are blind users of text-to-speech.

Synthesis method

ESpeak intro by eSpeak in English

eSpeak provides two methods of synthesis: the original eSpeak synthesizer and a Klatt synthesizer.^[6] In addition, eSpeak can be used as a front-end, providing text-to-phoneme translation and prosody, to MBROLA diphone voices.

The eSpeak and Klatt synthesizers use different types of formant synthesis.

The eSpeak synthesizer creates voiced speech sounds such as vowels and sonorant consonants by adding together sine waves to make the formant peaks. Unvoiced consonants such as /s/ are made by playing recorded sounds. Voiced consonants such as /z/ are made by mixing a synthesized voiced sound with a recorded unvoiced sound.

The Klatt synthesizer mostly uses the same formant data as the eSpeak synthesizer. It produces voiced sounds by starting with a waveform which is rich in harmonics (simulating the vibration of the vocal cords) and then applying digital filters in order to produce speech sounds.

Features

eSpeak can be used as a command-line program, or as a shared library.

It supports Speech Synthesis Markup Language (SSML).

Language voices are identified by the language's ISO 639-1 code. They can be modified by "voice variants". These are text files which can change characteristics such as pitch range, add effects such as echo, whisper and croaky voice, or make systematic adjustments to formant frequencies to change the sound of the voice. For example, "af" is the Afrikaans voice. "af+f2" is the Afrikaans voice modified with the "f2" voice variant which changes the formants and the pitch range to give a female sound.

eSpeak uses an ASCII representation of phoneme names which is loosely based on the Kirshenbaum system.

Phonetic representations can be included within text input by including them within double square-brackets. For example: espeak -v en "Hello [[w3:ld]]" will say "Hello world" in English.

Languages

eSpeak does text-to-speech synthesis for the following languages, some better than others.

Afrikaans, Albanian, Aragonese, Armenian, Bulgarian, Cantonese, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Georgian, German, Greek, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Kannada, Kurdish, Latvian, Lithuanian, Lojban, Macedonian, Malaysian, Malayalam, Mandarin, Nepalese, Norwegian, Persian (Farsi), Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Spanish, Swahili, Swedish, Tamil, Turkish, Vietnamese, Welsh.

References

↑ http://espeak.sourceforge.net/download.html has Mac OS X and RISC OS binaries, and the source contains notes about compiling on DOS, generic Unix and Windows Mobile
↑ http://espeak.sourceforge.net/add_language.html
↑ search espeak on Github
↑ Taking ownership of the espeak project and its future
↑ Vote for new main espeak developer
↑ Dennis H. Klatt (1979). "Software for a cascade/parallel formant synthesizer" (PDF). J. Acoustical Society of America, 67(3) March 1980.

External links

Speech synthesis

Proprietary software	DECtalk Software Automatic Mouth Talk It! Microsoft Agent Microsoft Speech API Microsoft text-to-speech voices Readspeaker Voice browser CoolSpeech VoiceWeb BrowseAloud LaLaVoice Vocaloid Cantor Symphonic Choirs IVONA CereProc Utau Voiceroid NIAONiao Virtual Singer Vocalina Realivox CeVIO Creative Studio Chipspeech Alter/Ego PPG Phonem

Free software	eSpeak Gnopernicus Gnuspeech Orca Festival Speech Synthesis System FreeTTS Sinsy Automatik Text Reader

Machine	Echo 2 Pattern playback Phasor RIAS Texas Instruments LPC Speech Chips TuVox

Applications	AOLbyPhone DialogOS Dr. Sbaitso MBROLA Microsoft Narrator Microsoft Speech Server PlainTalk Voice font

Protocols	Speech Synthesis Markup Language SABLE VoiceXML

Developers/ Researchers	Alan W. Black Catherine Browman Franklin Seaney Cooper Gunnar Fant Haskins Laboratories Wolfgang von Kempelen Ignatius Mattingly Philip Rubin Yamaha

Process	Articulatory synthesis Concatenative synthesis Currah Inverse filter PSOLA Phase vocoder Self-voicing

This article is issued from Wikipedia - version of the Thursday, December 17, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.