Phonetic algorithm
From Wikipedia, the free encyclopedia
A phonetic algorithm is an algorithm for indexing of words by their pronunciation. Most phonetic algorithms were developed for use with the English language; consequently, applying the rules to words in other languages might not give a meaningful result.
They are necessarily complex algorithms with many rules and exceptions, because English spelling and pronunciation is complicated by historical changes in pronunciation and words borrowed from many languages.
Among the best-known phonetic algorithms are:
- Soundex, which was developed to encode surnames for use in censuses. Soundex codes are four-character strings composed of one letter followed by three numbers.
- Daitch-Mokotoff Soundex, which is a refinement of Soundex designed to better match surnames of Slavic and Yiddish origin. Daitch-Mokotoff Soundex codes are strings composed of six numeric digits.
- Metaphone and Double Metaphone, which is suitable for use with most English words, not just names. Metaphone algorithms are the basis for many popular spellcheckers.
- Miracode
- New York State Identification and Intelligence System (NYSIIS), which maps similar phonemes to the same letter. The result is a string that can be pronounced by the reader without decoding.
[edit] See also
[edit] External links
- Project Dedupe http://dedupe.sourceforge.net