Ban (information)
From Wikipedia, the free encyclopedia
Theoretical concepts in Information Theory are widely used in making and breaking cryptography systems. Measuring for information entropy, redundancy, and channel capacity can provide with meaningful insights about the messages involved. Some of the units of measurement discussed here include the Ban, the hartley, the nat, the shannon, and the centiban.
Contents |
[edit] Units of Measurement in Information Theory
[edit] BAN
A ban, sometimes called a hartley (symbol Hart), is a logarithmic unit which measures information or entropy, based on base 10 logarithms and powers of 10, rather than the powers of 2 and base 2 logarithms which define the bit. Like a bit corresponds to a binary digit, a ban is a decimal digit. A deciban is one tenth of a ban. A centiban is one hundreth of a ban.
One ban corresponds to about 3.32 bits (log2(10)), or 2.30 nats (ln(10)). A deciban is about 0.33 bits. A centiban is about 0.033 bits.
[edit] Hartley
A hartley is a unit of information content used in information and communications theory. The hartley is similar to the shannon. A hartley unit of information is equal to one of ten possible and equally likely values or states of anything used to store or convey information.
One hartley equals log 2 (10).
[edit] Natural unit or Nat
A nat is a unit of information content used in information and communications theory. The nat is similar to the shannon but uses the natural logarithm (to the base e) instead of the logarithm to the base 2. If the probability of receiving a particular message is p, then the information content of the message is -loge p nats.
For example, if a message is a string of 5 letters or numerals, with all combinations being equally likely, then a particular message has probability 1/365 and the information content of a message is 5(loge 36) = 17.9176 nats.
One nat equals log2 e = 1.442 695 shannons or log10 e = 0.434 294 hartleys.
[edit] Shannon
A Shannon is used in information and communications theory. It is based on the unicity distance concept. There is a certain redundancy of a plaintext so it attempts to give a minimum amount of ciphertext necessary to ensure unique decipherability. Furthermore, a shannon uses the idea that messages that are less likely to occur are more informative (when they do occur) than more-likely ones. Thus if a tsunami rarely occurs, then a message indicating that one is occurring will be more informative than a message that one is not occuring. If a message has probability p of being received, then its information content is -log2 p shannons.
If a message consists of 10 letters, and all strings of 10 letters are equally likely, then the probablity of a particular message is 1/2610 and the information content of the message is 10(log2 26) = 47.004 shannons.
This unit was originally called the bit [2], because when the message is a bit string and all strings are equally likely, then the information content turns out to equal the number of bits.
One shannon equals log10 2 = 0.301 030 hartley or loge 2 = 0.693 147 nat.
[edit] Centiban
In cryptography, a centiban is a scoring unit for probabilities equal to one-hundredth of the true scoring unit, enabling decimal fractions to be expressed more conveniently as whole numbers. The true score (i.e. the logarithm of the probability of a particular occurrence) multiplied by one hundred is the score expressed in centibans, cf. deciban.
A collection of 5,000 digrams was studied by the US government; they in turn produced a table of centiban weights. These weights are used in the decipherment of column transposition ciphers. This table contains the logarithm of twice the frequency of each digram. This allows working with the integer value of the frequency, instead of its decimal value.
[edit] The Centiban Table
When working with column transposition ciphers, the centiban weight is used to determine which two columns to select as a possible digraph to decipher. The larger the centiban weight, then the digraph is more likely to be found.
[edit] Centiban Table Example
Columns 1, 2 and 3 are compared, which two columns are more likely to provide the correct plaintext?
The centiban weight of columns 1 and 2 = 385, but the centiban weight of column 1 and 3 is 403. Thus column 1 and 3 are the better choice to use for possible matches.
[edit] History
The ban and the deciban were invented by Alan Turing with I. J. Good in 1940, to measure the amount of information which could be deduced by the codebreakers at Bletchley Park using the Banburismus procedure, towards determining each day's unknown setting of the German naval Enigma cipher machine. The name was inspired by the enormous sheets of card, printed in the town of Banbury about 30 miles away, that were used in the process.
Unit Name | Log Base | Coined By | Year | etymology |
---|---|---|---|---|
hartley | 10 | R.Hartley | 1928 | R.Hartley |
ban, deciban | 10 | Alan Turing | 1941 | Town of Banbury |
bit | 2 | CE Shannon & JW Tukey | 1948 | Binary Integer |
Shannon | 2 | CE Shannon | ||
nat (information) | e | natural unit |
[edit] Usage as a unit of probability
The deciban is a particularly useful measure of odds-ratios or weights of evidence. 10 decibans corresponds to an odds ratio of 10:1; 20 decibans to 100:1 odds, etc.
According to I. J. Good a change in a weight of evidence of 1 deciban (ie a change in an odds ratio from evens to about 55:45), or perhaps half a deciban, is about as finely as humans can reasonably be expected to quantify their degree of belief in a hypothesis.
[edit] References
- Hartley, R.V.L., "Transmission of Information," Bell System Technical Journal, July 1928
- Reza, Fazlollah M. An Introduction to Information Theory. New York: Dover, 1994. ISBN 0-486-68210-2
- David J. C. MacKay. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003. ISBN 0-521-64298-1. This on-line textbook includes a chapter on the units of information content, and the game of Banburismus that the codebreakers played when cracking each day's Enigma codes.
- Sale, Tony, "The Bletchley Park 1944 Cryptographic Dictionary formatted by Tony Sale"(c)2001