Verhoeff algorithm

From Wikipedia, the free encyclopedia

The Verhoeff algorithm, a checksum formula for error detection first published in 1969, was developed by Dutch mathematician Jacobus Verhoeff (born 1927). Like the more widely known Luhn algorithm, it works with strings of decimal digits of any length. It does a better job than the Luhn algorithm, though, in that it will detect all "transposition" errors (switching of two adjacent digits), as well as catching many other types of errors that pass the Luhn formula undetected.

Verhoeff devised his algorithm using the properties of D5 (the dihedral group of order 10) — a non-commutative system of operations on ten elements, corresponding to the results of rotating and/or reflecting (flipping) a regular pentagon. In practice, however, the scheme would normally be implemented using precomputed lookup tables.

Contents

[edit] Tables

The Verhoeff algorithm can be implemented using three tables: a multiplication table d, a permutation table p, and an inverse table inv.

The first table, d, is based on multiplication in the dihedral group D5.

d(j,k) k
j         0     1     2     3     4     5     6     7     8     9  
0 0 1 2 3 4 5 6 7 8 9
1 1 2 3 4 0 6 7 8 9 5
2 2 3 4 0 1 7 8 9 5 6
3 3 4 0 1 2 8 9 5 6 7
4 4 0 1 2 3 9 5 6 7 8
5 5 9 8 7 6 0 4 3 2 1
6 6 5 9 8 7 1 0 4 3 2
7 7 6 5 9 8 2 1 0 4 3
8 8 7 6 5 9 3 2 1 0 4
9 9 8 7 6 5 4 3 2 1 0


The second table, p, applies a permutation to each digit based on its position in the number. The positions of the digits are counted from right to left, starting with zero. The permutation repeats after eight rows (the row for pos=8 is identical to the row for pos=0, etc.).

p(pos,num) num
pos         0     1     2     3     4     5     6     7     8     9  
0 0 1 2 3 4 5 6 7 8 9
1 1 5 7 6 2 8 3 0 9 4
2 5 8 0 3 7 9 6 1 4 2
3 8 9 1 6 0 4 3 5 2 7
4 9 4 5 3 1 2 6 8 7 0
5 4 2 8 6 5 7 3 9 0 1
6 2 7 9 3 8 0 6 4 1 5
7 7 0 4 6 9 1 3 2 5 8


The third table, inv, represents the multiplicative inverse of a digit in the dihedral group D5: in other words, for any j, the inv table shows the value k such that d(j,k) = 0.

j   0     1     2     3     4     5     6     7     8     9  
inv(j) 0 4 3 2 1 5 6 7 8 9

[edit] Algorithm

Using the above tables, the following procedure will perform the Verhoeff checksum calculation on a number.

  1. Create an array n out of the individual digits of the number, taken from right to left (rightmost digit is n0, etc.).
  2. Initialize the checksum c to zero.
  3. For each index i of the array n, starting at zero, replace c with d(c, p(i, ni)).

The original number has a valid check digit if and only if c = 0. If the original number ends in a zero (i.e., n0 = 0), then inv(c) is the proper value to use as the check digit in place of the final zero.

[edit] Example

Validate the checksum for the number 1428570.

The first step is to break up the number into an array n = [0,7,5,8,2,4,1], in which the digits are listed in reverse order (right to left). Then, the other values in the formula are computed in sequence. Since the final value of c is zero, the check digit is valid.

  i     ni     p(i,ni)   previous
c
new c =
d(c,p(i,ni))
0 0 0 0 0
1 7 0 0 0
2 5 9 0 9
3 8 2 9 7
4 2 5 7 2
5 4 5 2 7
6 1 7 7 0

[edit] Strengths and weaknesses

The Verhoeff algorithm will detect all occurrences of the following common transcription errors in a number:

  • Replacement of a single digit by a different digit (ab).
  • Transposition (switching) of two adjacent digits (abba).

Additionally, the Verhoeff algorithm detects most (but not all) occurrences of the following less common errors:

  • Twin errors (aabb).
  • Jump twin errors (acabcb).
  • Jump transpositions (abccba).
  • Phonetic errors (a01a; e.g.; "sixty" ↔ "sixteen").

The main weakness of the Verhoeff algorithm is its complexity. Unlike the Luhn algorithm, the calculations required for a Verhoeff check digit cannot readily be performed by hand from memory. The involved nature of the Verhoeff check might especially be seen as a drawback if the client applications within a system need to explicitly report that an invalid ID has failed the check digit test (as opposed to an ID simply not being found in the system's database). If it is sufficient for a client to look up each ID in a master database and report malformed values as "not found," then only the piece of the system that issues new ID's needs to know how to do the Verhoeff calculations, and the complexity issue is mitigated.

[edit] References

  • Verhoeff, J. “Error Detecting Decimal Codes”, Mathematical Centre Tract 29, The Mathematical Centre, Amsterdam, 1969.

[edit] External links