Banburismus

From Wikipedia, the free encyclopedia

Banburismus was a process invented by Alan Turing at Bletchley Park in England during the Second World War. It was used by Hut 8 at Bletchley Park to break German Kriegsmarine (i.e., Naval) Enigma. It was a codebreaking procedure which used an early form of Bayesian networks to infer information about the settings of the Enigma machine. It gave rise to Turing's conception of information, measured in bans — roughly the same concept as Shannon entropy.

Hut 8 performed Banburismus continually for two years, stopping in 1943 only because the latest generation of ultra-fast Bombe could run a wheel-order in just 2 minutes, and it became easier just to brute-force the keys. Hugh Alexander was regarded as the best of the banburists — he and Jack Good considered the process more an intellectual game than a job. "Not too hard as to be impossible, but difficult enough to be an enjoyable challenge," they commented.

Contents

[edit] History

At least as early as 1939, Alan Turing correctly deduced that the message-settings of Kriegsmarine Enigma signals were enciphered on a common Grundstellung (starting position of the rotors), then were super-enciphered with a bigram lookup table. However, without a copy of the bigram tables, Hut 8 were unable to start attacking the traffic until the summer of 1940. The break-in happened after an armed trawler called Polares was surprised and seized by British forces in the North Sea off Norway in late April that year. The Germans didn't have time to destroy all their cryptographic documents, and amongst the captured material were settings-lists for 22–29 April and some message pads with paired plaintext and enciphered messages.

The bigram tables themselves were not part of the capture, but Bletchley Park were able to use the settings-lists to read (retrospectively) all the Kriegsmarine traffic that had been intercepted between 22–29 April. This in turn let them do a partial reconstruction of the bigram tables.

The stage was set for the first attempt to use Banburismus to attack Kriegsmarine traffic from April 30 onwards. Eligible days were those where at least 200 messages could be found for which the partial bigram-tables would decipher the indicators. The first day to be broken was May 8, 1940, henceforth celebrated as "Foss's Day" in honour of Hugh Foss, the cryptanalyst who achieved the feat.

This feat took Foss until November that year — the traffic was by then uselessly out of date, but of course the break-in proved Banburismus could work. It also allowed much more of the bigram tables to be reconstructed and that in turn allowed more days from May and June to be broken retrospectively. However, the Kriegsmarine changed the bigram tables in mid June, and Hut 8 were reduced to occasional decrypts (mostly due to kisses and gardening) until the next pinch in early 1941.

Banburismus was to become the standard procedure against Kriegmarine Enigma from early 1941 until mid 1943.

[edit] Basic principles

Banburismus is an attack on the indicators (the encrypted message settings) of Kriegsmarine Enigma traffic. It can only be used when an Enigma machine has been used with a fixed setting (the Grundstellung) to create those indicators. The indicators effectively form a set of three-letter enigma messages "in depth". The principle of Banburismus is to guess the plaintext of those three-letter messages (the message-settings) by statistical examination of the messages themselves.

The principle is simple. If you take two sentences in any natural language, lay them one above the other and count where letters in one message are the same as the matching letter in the other message, then you will find many more matches than you would have had the sentences been random junk. If both messages were enciphered with an Enigma machine at the same message-setting, then the matches will occur just as they did in the plaintexts. However, if the message-settings were not the same then the two ciphertexts will compare as if they were random junk, and you'd expect about one match every 26 characters.

This principle allows an attacker to take two messages whose indicators differ only in the third character, and slide them against each other looking for the giveaway repeat pattern that shows where they align in depth. This gives a vital clue as to the possible third characters of the plaintexts of those indicators.

This comparison of two messages, looking for the repeats, was made easier by punching the messages onto thin cards about 250mm high (10") by several metres wide (they had different cards for different lengths of message). A hole at the top of a column on the card represented an 'A' at that position, a hole at the bottom represented a 'Z'. Two message-cards were laid on top of each other on a light-box and where the light shone through, there was a repeat in the messages. This made it much easier to count the repeats.

The cards were printed in Banbury, England. They became known as 'banburies' in BP, and hence the procedure using them was Banburismus.

[edit] A quick example

Message with indicator "VFG": GXCYBGDSLVWBDJLKWIPEHVYGQZWDTHRQXIKEESQSSPZXARIXEABQIRUCKHGWUEBPF

Message with indicator "VFX": YNSCFCCPVIPEMSGIZWFLHESCIYSPVRXMCFQAXVXDVUQILBJUABNLKMKDJMENUNQ

Hut 8 would punch these onto banburies and count the repeats for all valid offsets -25 letters to +25 letters. There are two promising positions:

GXCYBGDSLVWBDJLKWIPEHVYGQZWDTHRQXIKEESQSSPZXARIXEABQIRUCKHGWUEBPF
         YNSCFCCPVIPEMSGIZWFLHESCIYSPVRXMCFQAXVXDVUQILBJUABNLKMKDJMENUNQ
                       - --  -   -          -  -   --

This is 9 repeats (including two bigrams) in an overlap of 56 characters.

The other promising setup looks like this:

GXCYBGDSLVWBDJLKWIPEHVYGQZWDTHRQXIKEESQSSPZXARIXEABQIRUCKHGWUEBPF
        YNSCFCCPVIPEMSGIZWFLHESCIYSPVRXMCFQAXVXDVUQILBJUABNLKMKDJMENUNQ
                 ---

This is just a single trigram in an overlap of 57 characters.

Bayesian statistics allows us to know which of these situations is most likely to represent messages in depth. As you might expect, the former is the winner with odds of 5:1 on, the latter is only 2:1 on.

BP would work on the principle that the plaintext of "VFX" is 9 characters ahead of "VFG", or (in terms of just the third letter of the texts) that "X = G+9".

[edit] Scritchmus

Hut 8 might have evidence from other message-pairs (with only the third indicator letter differing) showing that "X = Q-2", "H = X-4" and "B = G+3". They could then construct a "chain" as follows:

G--B-H---X-Q

Now, if you lay this knowledge over the letter-sequence of an Enigma rotor, you find that quite a few possibilities are discounted due to breaking either the "reciprocal" property or the "no-self-ciphering" property of the Enigma machine:

G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is possible

 G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (G enciphers to B, yet B enciphers to E)

  G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (H apparently enciphers to H)

   G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (G enciphers to D, yet B enciphers to G)

    G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (B enciphers to H, yet H enciphers to J)

     G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (Q apparently enciphers to Q)

      G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (G apparently enciphers to G)

       G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (G enciphers to H, yet H enciphers to M)

        G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is possible

         G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is possible

          G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is possible

           G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (H enciphers to Q, yet Q enciphers to W)

            G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (X enciphers to V, yet Q enciphers to X)

             G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (B enciphers to Q, yet Q enciphers to Y)

              G--B-H---X-Q
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (X enciphers to X)

Q              G--B-H---X->
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is possible

-Q              G--B-H---X->
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (Q enciphers to B, yet B enciphers to T)

X-Q              G--B-H--->
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is possible

-X-Q              G--B-H-->
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (X enciphers to B, yet B enciphers to V)

--X-Q              G--B-H->
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is possible

---X-Q              G--B-H->
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (X enciphers to D, yet B enciphers to X)

H---X-Q              G--B->
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (Q enciphers to G, yet G enciphers to V)

-H---X-Q              G--B->
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (H enciphers to B, yet Q enciphers to H)

B-H---X-Q              G-->
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is possible (note the G enciphers to X, X enciphers to G property)

-B-H---X-Q              G->
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is impossible (B enciphers to B)

--B-H---X-Q              G->
ABCDEFGHIJKLMNOPQRSTUVWXYZ  ......... is possible

The so called "end-wheel alphabet" is already limited to just nine possibilities, merely by establishing a letter-chain of five letters derived from a mere four message-pairs. Hut 8 would now try fitting other letter-chains — ones with no letters in common with the first chain — into these nine candidate end-wheel alphabets.

Eventually they will hope to be left with just one candidate, maybe looking like this:

         NUP
F----A--D---O
--X-Q              G--B-H->
ABCDEFGHIJKLMNOPQRSTUVWXYZ

Not only this, but such an end-wheel alphabet forces the conclusion that the end wheel is in fact "Rotor I". This is because "Rotor II" would have caused a mid-wheel turnover as it stepped from "E" to "F", yet that's in the middle of the span of the letter-chain "F----A--D---O". Likewise, all the other possible mid-wheel turnovers are precluded. Rotor I does its turnover between "Q" and "R", and that's the only part of the alphabet not spanned by a chain.

That the different Enigma wheels were given different turnover points was, presumably, a measure by the designers of the machine to improve its security. However, this very complication allowed Bletchley Park to deduce the identity of the end wheel.

[edit] The middle wheel

Once the end wheel is identified, these same principles can be extended to handle the middle rotor, though with the added complexity that you are now looking for overlaps in message-pairs sharing just the first indicator letter, and that the overlaps could therefore occur at up to 650 characters apart.

The workload of doing this is beyond manual labour, so BP punched the messages onto 80-column cards and used Hollerith machines to scan for tetragram repeats or better. That told them which banburies to set up on the light boxes (and with what overlap) to evaluate the whole repeat pattern.

Armed with a set of probable mid-wheel overlaps, Hut 8 could compose letter-chains for the middle wheel much in the same way as was illustrated above for the end wheel. That in turn (after Scritchmus) would give at least a partial middle wheel alphabet, and hopefully at least some of the possible choices of rotor for the middle wheel could be eliminated from turnover knowledge (as was done in identifying the end wheel).

Taken together, the middle wheel alphabet and the end wheel alphabet would yield a menu for the Bombe, and the number of wheel-orders to try on the Bombe would be significantly reduced from the 336 possible for the day.

[edit] References

[edit] See Also

ban (information)

[edit] External links