Talk:Confusion and diffusion

From Wikipedia, the free encyclopedia

Hello,

this is one of my first visits on Wikipedia, and if my editing experience is very scarce yet, I duly apologize.

OK, now to the discussion: I have recently been editing the article "confusion and diffusion" because I became aware of a statement in the article which simply is not true in my opinion.

Unfortunately my changes have been un-done so I wanted to take a chance to argue in favour of what I think is correct.

The point is that confusion and diffusion are -- at least in some lecture notes and books -- explained just in a self-respective way: confusing.

But if you happen to have read Shannon's original paper, which should be taken as the origin of any citations, the terms are made rather clear:

There are two "inputs" which lead to a ciphertext. One is the plaintext, the other one is the key. Diffusion leads to an elimination of a statistical relation between changes in single plaintext bits and changes in single ciphertext bits. If you change a plaintext bits, a cipher with good diffusion should lead to a 50 pc chance for _any_ ciphertext bit to change.

Confusion, on the other hand, does the same for the key<->ciphertext relation.

As I stated, I did the changes the day before yesterday, but they have been un-done.

Please read the original paper to get the terms correct, and then please re-do my changes. The current explanation is simply incorrect.

A link:

http://www.cs.ucla.edu/~jkong/research/security/shannon1949.pdf

Thanks for your contributions, and Wikipedia:Welcome to Wikipedia! Your above definition isn't how I understand confusion and diffusion, but it's been a little while since I've read Shannon's paper, so it might well be different. I'll have a leaf through it now, though it would be most helpful if you could give some relevant quotes from Shannon's paper. — Matt 07:37, 11 Jun 2004 (UTC)
To be more specific, the thing that I disagree with about your above definition is the assertion that confusion is the just the same thing as diffusion, but between key and ciphertext, not plaintext and ciphertext. I think confusion is about complexity of the relationship between the inputs and outputs, and diffusion is about the extent to which a small change in an input propagates to the output. — Matt 08:43, 11 Jun 2004 (UTC)

[edit] Shannon's paper

Now Shannon's paper is very lenghty indeed. If you are just interested in the issues of statistical methods reading especially Section 23 ("statistical methods") is sufficient.

Regards

OT

[edit] Quotations

Shannon writes:

Two methods (other than recourse to ideal systems) suggest themselves for frustrating a statistical analysis. These we may call the methods of diffusion and confusion. In the method of diffusion the statistical structure of M which leads to its redundancy is "dissipated" into long range statistics—i.e., into statistical structure involving long combinations of letters in the cryptogram...
The method of confusion is to make the relation between the simple statistics of E and the simple description of K a very complex and involved one.

Schneier writes in Applied Cryptography:

Confusion serves to hide any relationship between the plaintext, the ciphertext and the key. Remember how linear and differential cryptanalysis can exploit even a slight relationship between these three things? Good confusion makes the relationship statistics so complicated that even these powerful cryptanalytic tools won't work.
Diffusion spreads the influence of individual plaintext or key bits over as much of the ciphertext as possible. This also hides statistical relationships and makes cryptanalysis more difficult.

— Matt 08:04, 11 Jun 2004 (UTC)

...and Menezes, van Oorschot, Vanstone write (Handbook of Applied Cryptography):

p.20 1.36: Confusion is intended to make the relationship between the key and ciphertext as complex as possible. Diffusion refers to rearranging or spreading out the bits in the message so that any redundancy in the plaintext is spread out over the ciphertext.

— OT Fri Jun 11 10:29:59 CEST 2004

William Stallings, Cryptography and Network Security (3rd ed. p66-67):

In diffusion, the statistical structure of the plaintext is dissipated into long range statistics of the ciphertext. this is achieved by having each plaintext digit affect the value of many ciphertext digits, which is equivalent to saying that each ciphertext digit is affected by many plaintext digits...
The mechanism of diffusion seeks to make the statistical relationship between the plaintext and ciphertext as complex as possible to thwart attempts to deduce the key. On the other hand, confusion seeks to make the relationship between the statistics of the ciphertext and the value of the encryption key as complex as possible, again to thwart attempts to recover the key. Thus, even if the attacker can get some handle on the statistics of the ciphertext, the way in which the key was used to produce that ciphertext is so complex as to make it difficult to deduce the key.

— Matt 09:04, 11 Jun 2004 (UTC)

[edit] Clarification: First try

OK, I try to offer an explanation for both terms in my own words, and try to bit more precise than in my first uttering:

"Diffusion" is the spreading the redundancy of a plaintext over the ciphertext. What does that mean? In the set of all clear text messages, "sensible" clear text messages have a certain correlation between two-, three- and more-letter combinations (e.g. a "q" is always followd by a "u"). A cipher with good diffusion eliminates this correlation. Changes in one bit or letter affect every bit in the ciphertext. No correlation then means: the chances for a change in any ciphertext bit is 50 per cent.

I think we both agree with this definition.

Yes, I think I broadly agree, though I might have a minor quibble: I think the property you mention is also termed the Strict Avalanche Criterion (SAC). I think diffusion is a more general concept than SAC that covers properties such as completeness, avalanche, higher-order SAC, that kind of thing; general dependancy of output digits on input digits, which can be measured using a variety of criteria. — Matt 10:48, 11 Jun 2004 (UTC)
Right, then let us put it this way: if diffusion (i.e. the spreading of correlation) is taken to the one extreme so that a change in a single bit of plaintext leads to a 50 pc chance for any single bit in the ciphertext to change, then the cipher meets the SAC. Diffusion alone also includes stages in-between where the SAC is not met.
Diffusion also (IMO) includes stronger criteria, such as SAC of order n... etc (fixing any n or fewer bits to a fixed constant, the resulting function is still SAC). — Matt 11:04, 11 Jun 2004 (UTC)

Now to "confusion": Shannon's view is to have the influence of the key bits on the ciperhtext such that a statistical analysis of the ciphertext does not help to cut out a simply-structured subset which the correct key most probably is situated in. Tying down the key closer and closer shall be practically impossible because the subset is too folded (in a geometric view) inside the set of all keys.

It should therefore be as difficult as possible to infer the key by a known-plaintext attack, for example.

It is generally stated that a complex polyalphabetic substitution rule does just that.

Now to the often-stated relation between diffusion/confusion and transposition/substitution: clearly, transposition adds to the diffusion of the cipher, but only if additional steps are taken. A simple transposition does not spread the influence of a bit change. It simply transposes it.

Similarly, a polyalphabetic cipher alone does not lead to confusion, because a simple key change only leads to a simple ciphertext change (Take Vigenere: if you just change the first key letter of a say ten-letter Vigenere key, then cipher text letters 1,11,21,31 etc are changed, none else.)

So even though transposition is the ingredient for diffusion, and substitution is the ingredient for confusion, it is an iterated combination of both, which really provides both confusion and diffusion.