New York State Identification and Intelligence System
The New York State Identification and Intelligence System Phonetic Code, commonly known as NYSIIS, is a phonetic algorithm devised in 1970 as part of the New York State Identification and Intelligence System (now a part of the New York State Division of Criminal Justice Services). It features an accuracy increase of 2.7% over the traditional Soundex algorithm.[1]
Procedure
The algorithm, as described in Name Search Techniques,[2] is:
- Translate first characters of name: MAC → MCC, KN → N, K → C, PH, PF → FF, SCH → SSS
- Translate last characters of name: EE → Y, IE → Y, DT, RT, RD, NT, ND → D
- First character of key = first character of name.
- Translate remaining characters by following rules, incrementing by one character each time:
- EV → AF else A, E, I, O, U → A
- Q → G, Z → S, M → N
- KN → N else K → C
- SCH → SSS, PH → FF
- H → If previous or next is non-vowel, previous.
- W → If previous is vowel, A.
- Add current to key if current is not same as the last key character.
- If last character is S, remove it.
- If last characters are AY, replace with Y.
- If last character is A, remove it.
- Append translated key to value from step 3 (removed first character)
- If longer than 6 characters, truncate to first 6 characters. (only needed for true NYSIIS, some versions use the full key)
References
- ↑ Rajkovic, P.; Jankovic, D. (2007), "Adaptation and Application of Daitch-Mokotoff Soundex Algorithm on Serbian Names" (PDF), XVII Conference on Applied Mathematics, Novi Sad, Serbia, archived from the original (PDF) on August 27, 2011
- ↑ Taft, R. L. (1970), "Name Search Techniques", New York State Identification and Intelligence System, Albany, New York
External links
- NIST Dictionary of Algorithms and Data Structures entry, including pointers to several implementations
- Sample coder, using a variant of the algorithm
- Ruby Implementation
- C# Implementation
This article is issued from
Wikipedia.
The text is licensed under Creative Commons - Attribution - Sharealike.
Additional terms may apply for the media files.