CMU Pronouncing Dictionary

CMU Pronouncing Dictionary
Developer(s) Carnegie Mellon University
Stable release 0.7a / February 18, 2008
Development status Maintained
Available in English
License Public Domain
Website Homepage

The CMU Pronouncing Dictionary (also known as cmudict) is a public domain pronouncing dictionary created by Carnegie Mellon University (CMU). It defines a mapping from English words to their North American pronunciations, and is commonly used in speech processing applications such as the Festival Speech Synthesis System and the CMU Sphinx speech recognition system. The latest release is 0.7a, which contains 133,746 entries (from 123,442 baseforms).

Database Format

The database is distributed as a text file of the format word <two spaces> pronunciation. If there are multiple pronunciations available for a word, all subsequent entries are followed by an index in parentheses. The pronunciation is encoded using a modified form of the Arpabet system, with the addition of stress marks on vowels of levels 0, 1, and 2.

History

Version Release date[1]
0.1 16 September 1993
0.2 10 March 1994
0.3 28 September 1994
0.4 8 November 1995
0.5 No public release
0.6 11 August 1998
0.7a 19 February 2008[2]

Applications

See also

References

  1. ftp://ftp.cs.cmu.edu/project/speech/dict/
  2. http://sourceforge.net/forum/forum.php?forum_id=787627
  3. https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/logios/

External links