Keyword cipher

From Wikipedia, the free encyclopedia

A keyword is a monoalphabetic substitution. A keyword is thought of and then the letters of the alphabet are assigned a letter after the keyword has been decided. When the number of letters in the keyword has been assigned their encoded version, the rest of the alphabet is added to make sure that every letter of the alphabet has an encoded version. For example, using the keyword "kryptos" the rest of the alphabet will be laid out after the keyword reading "abcdefghijlmnquvwxz". Note that there is no K between the J and the L. This is because the letter K has already been used in the keyword. If we use the keyword "Kryptos" then the letters are substituted using the following table:

Plaintext A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Encoded K R Y P T O S A B C D E F G H I J L M N Q U V W X Z

So, all A's become K's, all B's become R's and so on. Let's try encoding the message "cryptography is cool" using the keyword "kryptos". Try it yourself before checking against the answer below.

Plaintext C R Y P T O G R A P H Y I S C O O L

Encoded Y L X I N H S L K I A X B M Y H H E

We can see that this cipher is monoalphabetic because all of the O's change to H's.

The best way to decode a keyword cipher is to know the keyword. If you know the keyword then you can make a table like the one above with the alphabet and the encoded message below each other. Then you can substitute all of the K's in this example back into A's and so on. If you do not know the keyword, one of the weaknesses of a keyword substitution is that it is vulnerable to attack from a frequency expectation graph. If you compare the amount of times a letter appears in an encoded message, to the amount of times that you would expect that the letters to be used in a normal message, you can nearly always work out what the keyword is, especially as humans are unable to generate random things and so the keyword will normally be associated with something you will know. The first graph below is the number of times that each letter is used in the encoded message, and the second one is the expected number for each letter. Image:Analysis.jpg Image:Expected.jpg

One of the weaknesses of a keyword cipher is highlighted by an expected letter frequency graph. In the expected values, the number of times that X,Y and Z are used is very small. Since the alphabet is listed after the keyword and the keyword hardly ever has all three of these letters, it is possible to see whether the encoding has been done by a keyword cipher. Also the number of times E is used in the encoded message is much lower than expected, again a giveaway that a keyword has been used.