Double Metaphone

From Wikipedia, the free encyclopedia

The Double Metaphone search algorithm is a phonetic algorithm written by Lawrence Philips and is the second generation of his Metaphone algorithm. Its implementation was described in the June 2000 issue of C/C++ Users Journal.

It is called "Double" because it can return both a primary and a secondary code for a string; this accounts for some ambiguous cases as well as for multiple variants of surnames with common ancestry. For example, encoding the name "Smith" yields a primary code of SM0 and a secondary code of XMT, while the name "Schmidt" yields a primary code of XMT and a secondary code of SMT--both have XMT in common.

Double Metaphone tries to account for myriad irregularities in English of Slavic, Germanic, Celtic, Greek, French, Italian, Spanish, Chinese, and other origin. Thus it uses a much more complex ruleset for coding than its predecessor; for example, it tests for approximately 100 different contexts of the use of the letter C alone. In the introduction to his original journal article, Mr. Philips defended this complexity:

Albert Einstein once said, "Everything should be made as simple as possible--but no simpler!" Simplicity, of course, is a fundamental touchstone of quality in engineering and science. If an algorithm accomplishes its task as simply as possible, and demonstrates a touch of intuitive inspiration as well, we compliment it as "elegant." Unfortunately for engineers, human activity can rarely be described elegantly. And although text processing is a critical technology at a time when millions of people are searching the web, the unsystematic and exception-laden English language often demands algorithms that look ugly to engineers.

[edit] External links

In other languages