Daitch-Mokotoff Soundex
From Wikipedia, the free encyclopedia
Daitch-Mokotoff Soundex (D-M Soundex) is a phonetic algorithm invented in 1985 by genealogist Gary Mokotoff, and later improved by Randy Daitch, both of the Jewish Genealogical Society. It is a refinement of the Russell and American Soundex algorithms designed to allow matching of Slavic and Yiddish surnames with similar pronunciation but differences in spelling.
Daitch-Mokotoff Soundex is sometimes referred to as "Jewish Soundex" and "Eastern European Soundex", although the authors discourage use of these nicknames for the algorithm.
Contents |
[edit] Improvements
Improvements over the older Soundex algorithms include:
- Coded names are six digits long, resulting in greater search precision (traditional Soundex uses four characters)
- Coded names can be stored as numeric values, which can save space in some applications (regular Soundex encodes values as alphanumeric text)
- Several rules in the algorithm encode multiple character n-grams as single digits (American and Russell Soundex do not handle multi-character n-grams)
- Multiple possible encodings can be returned for a single name (traditional Soundex returns only one encoding, even if the spelling of a name could potentially have multiple pronunciations)
[edit] Examples
Some examples:
Surname | American Soundex | D-M Soundex |
Peters | P362 | 739400, 734000 |
Peterson | P362 | 739460, 734600 |
Moskowitz | M232 | 645740 |
Moskovitz | M213 | 645740 |
Auerbach | A612 | 097500, 097400 |
Uhrbach | U612 | 097500, 097400 |
Jackson | J250 | 154600, 454600, 145460, 445460 |
Jackson-Jackson | J252 | 154664, 454664, 145466, 445466, 154646, 454646, 145464, 445464 |
[edit] See also
[edit] External links
- Mokotoff, Gary. "Soundexing and Genealogy." Describes the history and the motivations behind D-M Soundex.
- JewishGen. "Soundex Coding." Describes both Russel and D-M Soundex.
- Coles, Michael. "SQL 2000 DBA Toolkit, Part 3: Phonetic Matching" SQL Server-based implementation of the D-M Soundex algorithm w/source.