Talk:Information theory

From Wikipedia, the free encyclopedia

This article is within the scope of the following WikiProjects:
This article has an assessment summary page.


This article may be too technical for a general audience.
Please help improve this article by providing more context and better explanations of technical details to make it more accessible, without removing technical details.

}

To-do list for Information theory:
  • Restructure according to guidelines in Wikipedia:WikiProject Science, ie. :
    • start with 2 or 3 paragraphs for the general public, using daily life examples

Think I've done this Eweinber (talk) 05:18, 7 April 2008 (UTC)

    • explain the applications of information theory in a separate section
    • create a separate section on the theory, based on the existing material
    • present the history in the final section
  • merge page on channel capacity with the section onf channel capacity
  • citations

This article is also assessed within the mathematics field Probability and statistics.

Contents

[edit] Pending tasks (what do we want?)

Here are how I interpret the pending tasks:

  • Make sure intro and overview are understandable with daily life examples. Making sure the reader can subjectively tell what a "bit" is, for example, is a good start. Discussion of elephant trumpets, whale songs, and black holes, do not contribute in this manner, although their mere mention (with links, say, to the no hair theorem) may be useful to the curious.
  • Move applications up before mathematics (just after Overview) and make it independent. I'm not sure this is for the best, as Overview states some applications and, for the rest, it's useful to know some theory. Also, we need to be careful not to conflate security and information theory, since information knowledge is necessary for, yet not sufficient for, breaking ciphers. (For example, a Diffie-Hellman key exchange assumes that an adversary will have full knowledge, but the form of that knowledge assures that the adversary will not be able to use it within a reasonable amount of time. Thus, although secure in practice, Diffie-Hellman key exchange is information-theoretically insecure).
  • Move history down to end (just before References). Again, I'm not sure why this is best, and some "good" articles, like Thermodynamics, list history first. If moved, we need to check that this is just as readable, but the reader doesn't need to know scientific history to understand scientific phenomena. Clearly more post-1948 history is needed. Most applications of information theory, even elementary ones, are post-1948 (Huffman codes, arithmetic codes, LDPC codes, Turbo codes), and much of the theory is too, e.g., information divergence, information inequalities, Fisher sufficient statistics, Kolmogorov complexity, network information theory, etc. However, it might be best to move the history to its own article. The current section is far too in-depth.

There's still a lot to do and a lot to add, and we might reasonably ask if certain topics should be shortened or omitted, e.g., Kullback–Leibler divergence (which, although important, can generally be omitted from elementary explanation), differential entropy (which is most important for transinformation, which can be interpreted as a limit of discrete transinformation), gambling (which most information theoretists I know love but is a somewhat fringe topic). Again, the thermodynamics article is shorter and far less mathematical. Do we want this article to be long with full mathematical explanations or medium-sized with general explanations and links to the math? I don't know the right answer, but it might be nice to get some ideas of what people would like out of this.

A few final things to keep in mind:

  • Diversions or examples absent of explanation confuse, not clarify.
  • Sentences should be concise as possible, but no more so, for easy reading.
  • Proofread before (or at least immediately after) changes.
  • Others might not read the same way as you.

Due to these factors, it's good to discuss significant changes on the talk page. I find my "peer edited" sections are a lot better than my first attempts. And too much unilateral change can result in an unstable, unreadable, and flat-out wrong article. Calbaer 01:44, 15 June 2006 (UTC)

I say keep the K–L divergence. Not only is it important, but it is really useful in understanding the mutual information. The treatment of other stuff like gambling, intelligence, and measure theory could be cut. The measure theory section shows a useful mnemonic device, but the analogy is incomplete due to the failure of the transinformation to remain non-negative in the case of three or more random variables. It could probably be cut out entirely. History of information theory could indeed have its own article, but keep a short summary of history in this article here. Also some mention of Gibbs' inequality would be nice as it shows the K–L divergence (and thus the bivariate mutual information) to be non-negative. The AEP could be mentioned in the source theory section, since it is a property of certain sources. Don't hesitate to add what you think should be added. --130.94.162.64 17:17, 20 June 2006 (UTC)
K-L divergence is certainly fundamental, so it should stay. Measure theory is also fundamental, but it is not necessary for a good understanding of information theory, so it could go. History could be summarized and moved. I'll get to this when I can, though if someone else wants to do so first, go ahead. Calbaer 18:36, 20 June 2006 (UTC)
the section that mentions measure theory should be kept, or moved somewhere else, rather than deleted. also, the following sentence in that section is misleading/incorrect: "it justifies, in a certain formal sense, the practice of calling Shannon's entropy a "measure" of information." entropy is not a measure, given an random variable f, one integrates f lnf against a suitable measure (the Lebesgue measure, the counting measure, etc) to get the entropy. Mct mht 18:52, 20 June 2006 (UTC)
I think that the section confuses more than it helps; most readers will not have a working knowledge of measure theory, and, as you say, the section needs at least a little reworking. I'll delete it and add a link to Information theory and measure theory. Calbaer 19:47, 20 June 2006 (UTC)

Seems to me a big priority should be to break out all the details into other articles. Information Theory is very much a science. If you look at the "Mathematics" Wikipedia article, you'll see that there are few actual mathematics concepts dicussed. Its more a history and categorization of the sub-sciences of mathematics, with links. I picture "Information Theory" as a fairly small article (only because the science is young) giving only a bit more than a dictionary definition of "information theory", and then a ton of links with brief descriptions to categories, related theories, practical applications, algorithms, coding methods, etc.. I think this article should be more of a starting point. Details should be elsewhere. Also, until this can be done, at least a set of links might be nice. I guess I'm saying here I'm surprised that a search of the "Information Theory" article in Wikipedia doesn't find the phrase "forward error correction"! qz27, 22 June 2006

Good point, and one that argues for the elimination of the following as all but links: Self-information, Kullback–Leibler divergence, differential entropy. We shouldn't be scared to have a little math in the article, but regular, joint, and conditional entropy can be defined together and mutual entropy in terms of them. That's enough for the idealizations of source and channel coding. If people want to know why the math is as it is, there could be a definitions in information theory article or something like that. Two things though: 1) "error correction" is mentioned three times in the article (and I don't think it's that important to specifically state FEC in a "starting point" article) and 2) are you volunteering to do this? Calbaer 06:13, 23 July 2006 (UTC)
I went through much of the article and tried to make it more approachable with links to unnecessary math. Even now it may have too much math, and someone should probably reorganize it accordingly. Calbaer 22:45, 24 July 2006 (UTC)

The "Intelligence and Secrecy" section looks like someone's weird pet theory disguised as encyclopedic text. What does the Shannon-Hartley theorem have to do with keeping secrets? Even if this connection exists, it is far from clear here. kraemer 15:01, 11 June 2007 (UTC)

Information theory is everything in crypto -- entropy is its fundamental concept. The real questions are whether cryptography merits the mention (probably) and whether the subsection is written well enough to keep (probably not). CRGreathouse (t | c) 15:09, 11 June 2007 (UTC)
The text has been there since December 2005 and was likely written by User:130.94.162.64, who wrote the bulk of the first draft of this article. I've criticized him (or her) for a mistake or two, but the truth is that, without that user, this article would be far less rich. Unfortunately, there's been no sign of that IP since June 21, 2006, but still, I'd say that the text should be modified to something a bit better rather than simply removed, especially since it has, in some sense, "stood the test of time." But I agree it should be improved. It's important to express the fundamental idea of information theory in cryptography: That methods with keys shorter than their plaintext are theoretically breakable, and that modern crypto pretty much counts on the hypothesis — neither proven or disproven — that such methods require enough steps to break as to make them secure for any reasonable amount of time (e.g., the timespan of the universe). Calbaer 16:40, 11 June 2007 (UTC)
The text in question is the material surrounding the assertion, "It is extremely hard to contain the flow of information that has high redundancy." This has nothing to do with the Shannon-Hartley Theorem, which is about the relationship between channel noise and channel capacity. The difficulty in keeping a secret that is known to multiple people is not a function of the signal-to-noise ratio of any informational vector in any obvious way. If this is clear to someone else then the connection should be made explicit. If these are, as I suspect, unrelated ideas, then this section should be rewritten. The relationship between cryptography and information theory is indeed important -- far too important to be represented by some vanished author's unsupported ranting. kraemer 18:48, 18 June 2007 (UTC)
To be fair, the text has degraded, so we shouldn't call it a "vanished author's unsupported ranting." Anyway, here's a replacement that I'll do if people like it:
Information theoretic concepts apply to making and breaking cryptographic systems. Such concepts were used in breaking the German Enigma machine and hastening the end of WWII in Europe. Shannon himself defined an important concept now called the unicity distance. Based on the redundancy of the plaintext, it attempts to give a minimum amount of ciphertext necessary to ensure unique decipherability.
Information theory leads us to believe it is much more difficult to keep secrets than it might first appear. A brute force attack can break systems based on public-key cryptography or on most commonly used methods of private-key cryptography, such as block ciphers. The security of such methods comes from the assumption that no known attack can break them in a practical amount of time, e.g., before the universe meets its ultimate fate. Information theoretic security refers to methods such as the one-time pad which are not vulnerable to such brute force attacks. However, as in any other cryptographic system, one must be careful to correctly apply even information-theoretically secure methods; the Venona project was able to crack the one-time pads of the Soviet Union due to their improper reuse.
Calbaer 21:00, 18 June 2007 (UTC)
This is all true, though the connection between cryptographic attack and information theory could be drawn more explicitly. Big improvement though! kraemer 21:22, 18 June 2007 (UTC)
Well, if you have any modifications, you can make them and add them. Also, the PRNG subsection should be modified; for such generators, e.g., extractors, min-entropy, not the more common and fundamental information theoretic quantity of entropy, is the correct measurement. Calbaer 16:14, 19 June 2007 (UTC)

[edit] Game theory as a template?

Game theory was recently a featured article. Perhaps information theory should look more like that? Calbaer 03:52, 2 August 2006 (UTC)

[edit] Removed tag, added poached images, reworked header

The article has been largely dormant and those who have tagged it — not to mention have made many contributions — are no longer in the Wikipedia community with the same aliases. (One was kicked off in September; the other stopped contributing and/or changed IP address in June.) It's not a perfect article, but it's better, so I removed the tag. Any advice for how to improve it — by novice or by expert — would be appreciated, especially on this talk page. I feel some of the article is a bit too redundant, but it's better to be redundant than to omit or underplay basic facts. I added some images from other articles to visually illustrate the concepts, which will hopefully get people more interested in the subject and more quickly clear what it's all about. Finally, I made the disambiguation header shorter, updated (to reflect the library/information science split), and added informatics. By the way, my apologies for misstating that I thought "K-L divergence should stay." I meant that mutual information should stay. K-L divergence may be a bit too confusing for those who don't want to explore it as a separate topic. Calbaer 21:59, 25 October 2006 (UTC)

The illustration for Information theory#Channel capacity is missing labels for X and Y, which are referred to in the text. 198.145.196.71 23:28, 13 September 2007 (UTC)
I changed to one what has x and y marked (these are elements of the spaces X and Y). Dicklyon 00:33, 14 September 2007 (UTC)
Perhaps the "Channel capacity" and "Source theory" sections could be swapped, since source theory talks about rate, which is a prerequisite for channel capacity. 198.145.196.71 23:31, 14 September 2007 (UTC)
That's fine, though "coding theory" shouldn't be after the two and hierarchically siblings to them (a change made here). It should come first and/or be under something different, preferably both. The idea for this was to explain source and channel coding, then mention their applications under "coding theory." (In fact, calling these "applications" is rather loose; maybe I'll change it around.) Calbaer 00:22, 15 September 2007 (UTC)
Coding theory is really important. It is the main application of information theory, its raison d'être, so to speak. It really shouldn't look like just another "application" among several others. Just go ahead and make your changes, and let's look at it from there. Let's make sure all the section and subsection titles are accurate, too. 198.145.196.71 15:28, 15 September 2007 (UTC)
You could make similar arguments for channel capacity and source theory. Maybe that section needs to be split into two. Think up a good heading and go for it. Dicklyon 16:11, 15 September 2007 (UTC)
Coding theory, source theory, and channel capacity have been rearranged now. A little blurb on source coding could go into the source theory section, and something about channel coding could be said in the channel capacity section, possibly with another diagram somewhat along this line:
Source --> Encoder --> Noisy Channel --> Decoder --> Receiver.
198.145.196.71 01:32, 21 September 2007 (UTC)

[edit] Entropy

In the entropy section the article says:

If \mathbb{M}\, is the set of all messages m that M could be, and p(m) = Pr(M = m), then M has

 H(M) = \mathbb{E}_{M} [-\log p(m)] = -\sum_{m \in \mathbb{M}} p(m) \log p(m)

bits of entropy. An important property of entropy is that it is maximized when all the messages in the message space are equiprobable — i.e., most unpredictable — in which case H(M) = log | M | .

but if all states are equiprobable then p(m)=\frac{1}{|\mathbb{M}|} so

H(M)=-\sum_{m \in \mathbb{M}} p(m) \log p(m)=-\sum_{m \in \mathbb{M}} \frac{1}{|\mathbb{M}|} \log \frac{1}{|\mathbb{M}|}=\log|\mathbb{M}|

not log | M | . Thesm 08:52, 6 December 2006 (UTC)

Changed; it was a slight abuse of notation. Though for something like this, you can Wikipedia:Be bold. Worst thing that can happen is that you'd be reverted. Calbaer 17:26, 6 December 2006 (UTC)

[edit] Boltzmann's entropy and von Neumann anecdote

No history of information theory would be complete without this famous anecdote: Claude Shannon asked John von Neumann which name he should give to this cool new concept he discovered: - \sum p_i \log_2 p_i \!. Von Neumann replied: "Call it H." Shannon: "H? Why H?" Von Neumann: "Because that's what Boltzmann called it."

Ludwig Boltzmann introduced the concept in 1870. Compare: Boltzmann, Ludwig (1896, 1898). Vorlesungen über Gastheorie : 2 Volumes - Leipzig 1895/98 UB: O 5262-6. English version: Lectures on gas theory. Translated by Stephen G. Brush (1964) Berkeley: University of California Press; (1995) New York: Dover ISBN 0-486-68455-5

Algorithms 19:51, 7 June 2007 (UTC)

If no history would be complete without it, add it to the History of information theory article, not a section of an article focusing on the idea, not the history. You might also want to find a source for it, since I've been in information theory for ten years and have never heard the "famous anecdote." Googling, I only encounter one webpage with the story, told secondhand without attribution. (The reference you give is for "H", not your "H story," I'm assuming.) For both these reasons, I don't think it should be in this article. Calbaer 00:13, 8 June 2007 (UTC)
Some of these books support the idea that von Neumann suggested Shannon use entropy after Boltzmann. This one has it as a quote from Shannon, not quite in that form. It's already quoted in History of information theory. Dicklyon 00:29, 8 June 2007 (UTC)
Calbaer reversed and wrote: "Anecdote doesn't belong; order is by relevance, not strictly chronological." But what could possibly be more relevant than the original entropy formulation of Boltzmann, providing the very foundations of information theory? Surprisingly, Boltzmann's work is even omitted from the references. This history seems a bit one-sided. Algorithms 19:47, 9 June 2007 (UTC)
That's because Boltzmann wasn't an information theorist. You would no more expect to find him on this page than Newton on the aerospace engineering page or Euler on the cryptography page. Yes, they did the math and statistics that made those fields possible — and Boltzmann merits a mention on the history page, but to give him a good portion of the short history section overstates his role in information theory. And random anecdotes also don't belong. I wouldn't want the history to get too dry, but overstating the relations between information theory and thermodynamics — which, although many, are largely unimportant — would not be beneficial to the article, especially in trying to make it accessable to those who may already be daunted by information theory being a mix of engineering, CS, stat, and math. Does anyone else feel the Boltzmann connection merits this much attention? Calbaer 00:39, 10 June 2007 (UTC)
I'm not so sure, either, how much of the history section should be devoted to Boltzmann, but I disagree on the importance of the relations between information theory and thermodynamics. The connection between the two is fundamental and deep, so much so, in fact, that I can say that one bit of (information-theoretic) entropy is equal to Boltzmann's constant times the natural logarithm of 2. I'm not sure, though, how much of that belongs in this article, which is already quite long. (My old IP address was 130.94.162.64, by the way.) -- 198.145.196.71 18:27, 25 August 2007 (UTC)
Welcome back. I guess I meant "unimportant" in the context of the article or your average textbook or paper on information theory. I didn't meant to trivialize the connection, which is indeed deep, but is usually not important for someone looking for either a reference or tutorial on information theory. Calbaer 20:09, 25 August 2007 (UTC)

[edit] Source theory and rate

I still don't like the expression

r = \lim_{n \to \infty} \mathbb E H(X_n|X_{n-1},X_{n-2},X_{n-3}, \ldots);

because it isn't clear what probability distribution the expectation is taken over. There is already a limit in this expression on n, and an expectation is implicit in the conditional entropy. 198.145.196.71 22:08, 15 September 2007 (UTC)

I don't care for it much either, though I wrote it. The formula previously didn't have the limit, which made sense only if the conditional entropy was independent of n (previously called t here); but it did have the expectation, which seems to me is needed if the conditional entropy varies with the identity of the previous symbols. I think the distribution is over those previous values. Maybe we can find a source about this. Dicklyon 22:44, 15 September 2007 (UTC)
OK, you're right, according to this book, eq. 2.34. Dicklyon 22:47, 15 September 2007 (UTC)
Actually the expression (without the expectation) is correct for a strictly stationary process, as you noted in the article. For a non-stationary process, the more general expression I added to the article (which I stole from Entropy rate) previously is needed (as you also noted apparently). 198.145.196.71 23:28, 15 September 2007 (UTC)

[edit] Help request

How about some help over on the article Information theory and measure theory? I wrote most of that article over a year ago, and it's still complete mess. 198.145.196.71 20:32, 25 August 2007 (UTC)

See these books for some good ideas. The online ref mentioned in that article doesn't even mention measure theory, so no wonder it's a mess. Dicklyon 05:51, 28 August 2007 (UTC)
I meant to be bold and help edit the article, and maybe take the expert tag off if that scares people from editing. I don't mean to claim ownership of it. More at Talk:Information theory and measure theory. 198.145.196.71 17:06, 7 September 2007 (UTC)

[edit] Variety

I had never heard of Ashby's concept of variety (cybernetics) until it was added to, and removed from, the see-also list in this article. So I looked in Google Books, and found all kinds of stuff about it. Some books on information theory, such as Kullback's Information Theory and Statistics, refer to Ashby's 1956 book Introduction to Cybernetics as part of the literature on information theory, and a big chunk of this book is about "variety". And his "requisite variety" paper is referenced in Uncertainty-Based Information: Elements of Generalized Information Theory. I don't see any reason to exclude this concept from the see-also list here on the basis of it being not information theory. Someone is being too picky, methinks. Dicklyon 03:41, 17 September 2007 (UTC)

I think it's helpful and just fine to mention something like this in the see-also list. 198.145.196.71 21:18, 22 September 2007 (UTC)

My main problem is that the article, as it stands, doesn't seem to have much to do with information theory. While variety itself may be relevant, the article reads more like one on social sciences. ⇌Elektron 12:27, 24 September 2007 (UTC)

These guys are doing a lot of "cybernetics" articles, which are topics that sort of bridge information theory's very mathematical style to the style of the social sciences. It's not what you're used to, but shouldn't be excluded from a see-also list on that basis. It may be more like what some readers are looking for. Dicklyon 14:34, 24 September 2007 (UTC)

[edit] References

The "References" section is starting to get disorganized. A footnote was added in conflict with the style of the other references, and something about Boltzmann that is not very relevant was added to "The Classic Work" subsection, which is supposed to contain just that: Shannon's classic work, "A Mathematical Theory of Communication." Perhaps there is a better way to organize the references, but it should be discussed on this talk page first, and in any case, the references should not be left in such disarray. This being the main information theory article, it contains a lot of material that is general knowledge in the field. Therefore several textbooks are listed, any of which will do for a reference for some of the "common knowledge" material (which should perhaps be more explicitly referenced in the body of the article). I would like some input and help here to better organize the references in this article. 198.145.196.71 20:45, 22 September 2007 (UTC)

Dicklyon just tagged the article as needing improved references, which I agree with. The footnote that was added is probably a good start. Should all the references be footnotes? I have several of the textbooks that are mentioned in the article, so I could go through and reference some of the material, but I am still somewhat unsure about some of the details, such as how best to deal with multiple references to the same source (in the footnote style). Perhaps I can add some references in the article, and then they can be changed to a more consistent style. 198.145.196.71 19:25, 28 September 2007 (UTC)

You can easily add multiple ref tags after each other; it's pointless to list more than 2 or 3 in support of a point, though, and usually 1 will do. Thanks for working on this. I'll try to help. Dicklyon 19:29, 28 September 2007 (UTC)
What I meant was referring to the same source at multiple points in the article without being redundant, as in "ibid." You can help me, too, by pointing out what specific statements in the article need sources to back them up. A lot of good sources are already listed at the end of the article; they just need to be referenced in the body. Some of them might be irrelevant because the material that referenced them has been moved to other articles, in which case we should make sure those articles have the necessary sources before deleting them entirely here. Very little content has actually been deleted from this article without moving it somewhere else. 198.145.196.71 22:11, 28 September 2007 (UTC)
Instead of ibid, use a named ref, like <ref name=foo>{{cite book | ...}}</ref> the first time, and just <ref name=foo/> for subsequent uses. Dicklyon 23:20, 28 September 2007 (UTC)

[edit] Page is too long

I suggest removing all technical and specialized material and instead placing it on the appropriate separate pages. I do not see why coding theory needs more than a paragraph or two on this page. Coding theory is a rich field in its own right...it is connected to information theory but I don't think we need to have a full exposition of it duplicated on this page. The same goes for other sections. I think this page should remain general-audience. As such, I think more discussion of information theory's applications and usefulness would be a good addition to the page. Cazort (talk) 20:37, 26 November 2007 (UTC)

I am the one originally responsible for much of this article's length that you consider excessive. It is admittedly a long article, but on the other hand I like having a collection of the basic formulas on this page. This page has by no means a full exposition of coding theory, only its very basic rudiments. I do agree that more discussion of applications and usefulness would be helpful, although that might make the article even longer. I would rather have an article that is a little bit too long than one that is too short and scanty on details. 198.145.196.71 (talk) 02:38, 20 December 2007 (UTC)
I think Czort's objection is apposite. This is a field with multitudes of connections elsewhere, and any article which addresses these even in passing will not be very small. In addition, the suggestion that all technical content be removed to other articles is ill advised. This is a tricky and technical field. The proper remedy for unclarity caused by technical material is better writing or a better introduction to technical sections.
Our Gentle Readers should not be written down to by hiding complexity of complex subjects. An article on a technical subject can never be made sufficiently innocuous that every Reader will not be put off. We should, instead, write very clearly, distinguishing between necessary technical material and supplemental technical material which could indeed be more fully explained in some other article(s). And, banishing technical content has the very considerable disadvantage of requireing tour Gentle Readers to chase links and assemble an adequate picture of a subject from them. This is a skill held by few and by very few of our presumed Gentle Readers.
There are numerous reasons why Czort's suggestion should not be followed. Concur, mostly, with 198.145, ww (talk) 21:41, 2 June 2008 (UTC)