Adler-32

From Wikipedia, the free encyclopedia

Adler-32 is a checksum algorithm which was invented by Mark Adler. It is almost as reliable as a 32-bit cyclic redundancy check for protecting against accidental modification of data, such as distortions occurring during a transmission, but is significantly faster to calculate in software.

Contents

[edit] History

It is a modification of the Fletcher checksum, which in its original form is slightly faster but less reliable.

A "rolling checksum" version of Adler-32 is used in the rsync utility.

[edit] The algorithm

An Adler-32 checksum is obtained by calculating two 16-bit checksums A and B and concatenating their bits into a 32-bit integer. A is the sum of all bytes in the string, B is the sum of the individual values of A from each step.

At the beginning of an Adler-32 run, A is initialized to 1, B to 0. The sums are done modulo 65521 (the largest prime number smaller than 216). The bytes are stored in network order (big endian), B occupying the two most significant bytes.

The function may be expressed as

 A = 1 + D1 + D2 + ... + Dn (mod 65521)
 B = (1 + D1) + (1 + D1 + D2) + ... + (1 + D1 + D2 + ... + Dn) (mod 65521)
   = n×D1 + (n-1)×D2 + (n-2)×D3 + ... + Dn + n (mod 65521)

 Adler-32(D) = B × 65536 + A

where D is the string of bytes for which the checksum is to be calculated, and n is the length of D.

[edit] Example

The Adler-32 sum of the ASCII string "Wikipedia" would be calculated as follows:

   ASCII code          A                   B

   W: 87           1 +  87 =  88        0 +  88 =   88
   i: 105         88 + 105 = 193       88 + 193 =  281
   k: 107        193 + 107 = 300      281 + 300 =  581
   i: 105        300 + 105 = 405      581 + 405 =  986
   p: 112        405 + 112 = 517      986 + 517 = 1503
   e: 101        517 + 101 = 618     1503 + 618 = 2121
   d: 100        618 + 100 = 718     2121 + 718 = 2839
   i: 105        718 + 105 = 823     2839 + 823 = 3662
   a: 97         823 +  97 = 920     3662 + 920 = 4582

   A = 920  =  398 hex
   B = 4582 = 11E6 hex

   Output: 11E60398 hex

(The modulo operation had no effect in this demonstration, since none of the values reached 65521).

[edit] Comparison with the Fletcher checksum

The first difference between the two algorithms is that Adler-32 sums are calculated modulo a prime number, whereas Fletcher sums are calculated modulo 24-1, 28-1, or 216-1 (depending on the number of bits used), which are all composite numbers. Using a prime number makes it possible for Adler-32 to catch differences in certain combinations of bytes that Fletcher is unable to detect.

The second difference, which has the largest effect on the speed of the algorithm, is that the Adler sums are computed over 8-bit bytes rather than 16-bit words, resulting in twice the number of loop iterations. This results in the Adler-32 checksum taking between one-and-a-half to two times as long as Fletcher's checksum.

[edit] Example implementation

An optimized implementation in the C programming language operates as follows:

#define MOD_ADLER 65521
       uint8_t *data;   /* Pointer to the data to be summed */
       size_t len;      /* Length in bytes */
       uint32_t a = 1, b = 0;

       while (len) {
               unsigned tlen = len > 5550 ? 5550 : len;
               len -= tlen;
               do {
                       a += *data++;
                       b += a;
               } while (--tlen);
               a = (a & 0xffff) + (a >> 16) * (65536-MOD_ADLER);
               b = (b & 0xffff) + (b >> 16) * (65536-MOD_ADLER);
       }
       /* It can be shown that a <= 0x1013a here, so a single subtract will do. */
       if (a >= MOD_ADLER)
               a -= MOD_ADLER;
       /* It can be shown that b can reach 0xffef1 here. */
       b = (b & 0xffff) + (b >> 16) * (65536-MOD_ADLER);
       if (b >= MOD_ADLER)
               b -= MOD_ADLER;
       return (b << 16) | a;

A few tricks are used here for efficiency:

  • Most importantly, by using larger (32-bit) temporary sums, it is possible to sum a great deal of data before needing to reduce modulo 65521. The requirement is that the reduction modulo 65521 must be performed before the sums overflow, which would cause an implicit reduction modulo 232 = 4294967296 and corrupt the computation.
  • 65536 ≡ 15 mod 65521, so 65536x ≡ 15x (mod 65521), and the expression (x & 0xffff) + (x >> 16)*15 reduces x modulo 65521. Only doing it once is not guaranteed to be complete, but it will be at most 0xffff0. A second repetition guarantees a result less than 65745, after which a single conditional subtraction reduces the sum to the range 0..65520.
  • The magic value 5550 is the largest number of sums that can be performed without overflowing b. Any smaller value is also permissible; 4096 may be convenient in some cases. Because this implementation does not completely reduce a, its limit is slightly lower than the 5552 mentioned in the RFC. The proof that 5550 is safe (and 5551 is not) is a bit intricate, and starts by proving that a can be at most 0x1013a at the start of the inner 4 loop.

[edit] Advantages and disadvantages

  • Warning: Like the standard CRC-32, the Adler-32 checksum can be forged easily and is therefore unsafe for protecting against intentional modification.
  • It has the benefit over a CRC that it can be computed faster in software.
  • Adler-32 has a weakness for short messages with few hundred bytes, because the checksums for these messages have a poor coverage of the 32 available bits.

[edit] Weakness

Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very short messages. He wrote "Briefly, the problem is that, for very short packets, Adler32 is guaranteed to give poor coverage of the available bits. Don't take my word for it, ask Mark Adler. :-)" The problem is that sum A does not wrap for short messages. The maximum value of A for a 128-byte message is 32640, which is below the value 65521 used by the modulo operation. An extended explanation can be found in RFC 3309, which mandates the use of CRC32 instead of Adler-32 for SCTP, the Stream Control Transmission Protocol.

[edit] See also

[edit] External links

  • RFC 1950 - specification, contains example C code
  • ZLib - implements the Adler-32 checksum
  • RFC 3309 - information about the short message weakness and related change to SCTP
In other languages