Polygraphic substitution
From Wikipedia, the free encyclopedia
A polygraphic substitution is a cipher in which a uniform substitution is performed on blocks of letters. When the length of the block is specifically known, more precise terms are used: for instance, a cipher in which pairs of letters are substituted is bigraphic.
As a concept, polygraphic substitution contrasts with monoalphabetic (or simple) substitutions in which individual letters are uniformly substituted, or polyalphabetic substitutions in which individual letters are substituted in different ways depending on their position in the text. In theory, there is some overlap in these definitions; one could conceivably consider a Vigenère cipher with an eight-letter key to be an octographic substitution. In practice, this is not a useful observation since it is far more fruitful to consider it to be a family of eight monoalphabetic substitutions.
[edit] Specific ciphers
In 1563, Giambattista della Porta devised the first bigraphic substitution. However, it was nothing more than a matrix of symbols. In practice, it would have been all but impossible to memorize, and carrying around the table would lead to risks of falling into enemy hands.
In 1854, Charles Wheatstone came up with the Playfair cipher, a keyword-based system that could be performed on paper in the field. This was followed up over the next fifty years with the closely-related four-square and two-square ciphers, which are slightly more cumbersome but offer slightly better security.
In 1929, Lester S. Hill developed the Hill cipher, which uses matrix algebra to encrypt blocks of any desired length. However, encryption is very difficult to perform by hand for any sufficiently large block size, although it has been implemented by machine or computer. This is therefore on the frontier between classical and modern cryptography.
[edit] Cryptanalysis of general polygraphic substitutions
Polygraphic systems do provide a significant improvement in security over monoalphabetic substitutions. Given an individual letter 'E' in a message, it could be encrypted using any of 52 instructions depending on its location and neighbors, which can be used to great advantage to mask the frequency of individual letters. However, the security boost is limited; while it generally requires a larger sample of text to crack, it can still be done by hand.
One can identify a polygraphically-encrypted text by performing a frequency chart of polygrams and not merely of individual letters. These can be compared to the frequency of plaintext English. The distribution of digrams is even more stark than individual letters. For example, the six most common letters in English (23%) represent approximately half of English plaintext, but it takes only the most frequent 8% of the 676 digrams to achieve the same potency. In addition, even in a plaintext many thousands of characters long, you would expect that nearly half of the digrams would not occur, or only barely. In addition, looking over the text you would expect to see a fairly regular scattering of repeated text in multiples of the block length and relatively few that are not multiples.
Cracking a code identified as polygraphic is similar to a general monoalphabetic substitution except with a larger 'alphabet'. You identify the most frequent polygrams, experiment replacing them with common plaintext polygrams, and attempt to build up common words, phrases, and finally meaning. Naturally, if your investigation led you to suspect that a code was of a specific type, like a Playfair or order-2 Hill cipher, then you could use a more specific attack.