Double Metaphone
From Wikipedia, the free encyclopedia
The Double Metaphone search algorithm is a phonetic algorithm written by Lawrence Philips and is the second generation of his Metaphone algorithm. Its implementation was described in the June 2000 issue of C/C++ Users Journal.
It is called "Double" because it can return both a primary and a secondary code for a string; this accounts for some ambiguous cases as well as for multiple variants of surnames with common ancestry. For example, encoding the name "Smith" yields a primary code of SM0 and a secondary code of XMT, while the name "Schmidt" yields a primary code of XMT and a secondary code of SMT--both have XMT in common.
Double Metaphone tries to account for myriad irregularities in English of Slavic, Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other origin. Thus it uses a much more complex ruleset for coding than its predecessor; for example, it tests for approximately 100 different contexts of the use of the letter C alone. In the introduction to his original journal article, Mr. Philips defended this complexity:
Albert Einstein once said, "Everything should be made as simple as possible--but no simpler!" Simplicity, of course, is a fundamental touchstone of quality in engineering and science. If an algorithm accomplishes its task as simply as possible, and demonstrates a touch of intuitive inspiration as well, we compliment it as "elegant." Unfortunately for engineers, human activity can rarely be described elegantly. And although text processing is a critical technology at a time when millions of people are searching the web, the unsystematic and exception-laden English language often demands algorithms that look ugly to engineers.
[edit] External links
- "The Double Metaphone Search Algorithm", C/C++ Users Journal, June 2000 (full-text access requires registration)
- Source code from the above issue (73 kilobytes, ZIP), including the Double Metaphone source in C++
- Project Dedupe http://dedupe.sourceforge.net
- PHP implementation: http://swoodbridge.com/DoubleMetaPhone/
- PHP implementation (native, in C): http://www.olivierhill.ca/archives/32-DoubleMetaphone-0.1.1.html
- Ruby implementation included in http://rubyforge.org/projects/text/
- Perl implementation: http://www.cpan.org/modules/by-authors/id/MAURICE/
- Java implementation: http://jakarta.apache.org/commons/codec/userguide.html
- Transact SQL implementation: http://www.sqlmag.com/Articles/ArticleID/26094/pg/1/1.html (full-text access requires subscription)
- Python and MySQL implementations: http://atomboy.isa-geek.com/plone/Members/acoil/programing/double-metaphone/