Checksum

From Wikipedia, the free encyclopedia
Effect of a typical checksum function (the Unix cksum utility)

A checksum or hash sum is a small-size datum computed from an arbitrary block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage.

The actual procedure that yields the checksum, given a data input is called a checksum function or checksum algorithm. Depending on its design goals, a good checksum algorithm will usually output a significantly different value even for small changes made to the input. This is especially the case of cryptographic hash functions. Due to this property they may be used to detect many data corruption errors and verify overall data integrity; if the computed checksum for the current data input matches the stored value of a previously computed checksum, there is a very high probability the data has not been accidentally altered or corrupted.

Checksum functions are related to hash functions, fingerprints, randomization functions, and cryptographic hash functions. However, each of those concepts has different applications and therefore different design goals. By themselves, checksums are often used to verify data integrity, but should not be relied upon to also verify data authentication. However they are used as cryptographic primitives in larger authentication algorithms. For cryptographic systems with these two specific design goals, see HMAC.

Check digits and parity bits are special cases of checksums, appropriate for small blocks of data (such as Social Security numbers, bank account numbers, computer words, single bytes, etc.). Some error-correcting codes are based on special checksums that not only detect common errors but also allow the original data to be recovered in certain cases.

Checksum algorithms

Parity byte or parity word

The simplest checksum algorithm is the so-called longitudinal parity check, which breaks the data into "words" with a fixed number n of bits, and then computes the exclusive or of all those words. The result is appended to the message as an extra word. To check the integrity of a message, the receiver computes the exclusive or (XOR) of all its words, including the checksum; if the result is not a word with n zeros, the receiver knows that a transmission error occurred.

With this checksum, any transmission error that flips a single bit of the message, or an odd number of bits, will be detected as an incorrect checksum. However, an error that affects two bits will not be detected if those bits lie at the same position in two distinct words. If the affected bits are independently chosen at random, the probability of a two-bit error being undetected is 1/n.

Modular sum

A variant of the previous algorithm is to add all the "words" as unsigned binary numbers, discarding any overflow bits, and append the two's complement of the total as the checksum. To validate a message, the receiver adds all the words in the same manner, including the checksum; if the result is not a word full of zeros, an error must have occurred. This variant too detects any single-bit error, but the promodular sum is used in SAE J1708.[1]

Position-dependent checksums

The simple checksums described above fail to detect some common errors that affect many bits at once, such as changing the order of data words, or inserting or deleting words with all bits set to zero. The checksum algorithms that are most used in practice, such as Fletcher's checksum, Adler-32, and cyclic redundancy checks (CRCs), address these weaknesses by considering not only the value of each word but also its position in the sequence. This feature generally increases the cost of computing the checksum.

General considerations

A single-bit transmission error then corresponds to a displacement from a valid corner (the correct message and checksum) to one of the m adjacent corners. An error that affects k bits moves the message to a corner that is k steps removed from its correct corner. The goal of a good checksum algorithm is to spread the valid corners as far from each other as possible, so as to increase the likelihood that "typical" transmission errors will end up in an invalid corner.

Checksum tools

  • CHK Checksum Utility, An advanced checksum tool with CRC32, ED2K (eMule/eDonkey2000), MD5, SHA1, SHA1-Base32, SHA256, SHA384, SHA512 and WHIRLPOOL support
  • MD5 & SHA Checksum UtilityA standalone freeware that can generate and verify MD5, & SHA-1 & SHA-256 hash from a file.
  • Advanced Hash Calculator, hash calculator software for multiple files for Windows that calculates CRC-32, MD2, MD4, MD5, SHA-1, SHA-256, SHA-384 and SHA-512 checksums.
  • Bitser, a free Microsoft Windows application that calculates MD5, SHA-1 and SHA-256 sums for any given input file.
  • checksum, a fast file, folder and drive hashing application for Windows.
  • MD5 File Hasher for Windows (Digital Tronic) , hash sum verification, monitoring, scheduled checks, detailed reporting for Windows.
  • cksum, a Unix command that generates both a 32-bit CRC and a byte count for any given input file.
  • File Checksum Integrity Verifier (FCIV), a command-prompt utility from Microsoft that computes and verifies MD5 or SHA-1 cryptographic hash values of files.
  • Hash Validation Tool (hash), a command-prompt utility that will generate/validate several types of hash values for multiple files.
  • Jacksum, a Java API, usable both through a GUI and a CLI, which incorporates many checksum implementations and allows to extend with as many as you need.
  • RHash, an open-source CLI tool and C library which incorporates a large number of checksum implementations.
  • jdigest, a Java GUI tool that generates and checks MD5 and SHA sums
  • jcksum, a Java library, that can be used by developers in Java applications to calculate checksums using different algorithms.
  • md5sum, a Unix command that generates an MD5 sum
  • sha1sum, another Unix command that generates a SHA-1 sum
  • Parchive, a crossplatform software that is capable of both verifying checksums and repairing errors when found.
  • sum, a Unix command (also ported to Win32) that generates order-independent sums; uses two different algorithms for calculating, the SYSV checksum algorithm and the BSD checksum (default) algorithm.

See also

General topic

Error correction

Hash functions

References

  1. "SAE J1708". Kvaser.com. Retrieved 2012-08-13. 

External links

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.