Talk:Differential entropy
From Wikipedia, the free encyclopedia
Contents |
[edit] unnecessary treatment
i am not sure how familiar the avg information theorist is with measure theory and integration. the article does use language such as "almost everywhere", "random variable", etc, so it must not be foreign. in that case, the division between "differential" and "non-differential" entropy is unnecessary, and quite misleading. one can simply use a single definition for entropy, for any random variable on a arbitrary measure space, and surely it has been done somewhere. Mct mht 03:53, 18 June 2006 (UTC)
- While not being an expert on the subject, I do not agree, see Information entropy#Extending_discrete_entropy_to_the_continuous_case:_differential_entropy - it is probably possible to give such a unique definition, but it would add unnecessary complication to the article; say, a Computer Science student does not normally need to study measure theory (I have a certain informal knowledge of it), as not all probability student need to know it. Beyond that, Shannon entropy is almost always used in the discrete version even because that's the natural application; beyond that, the two versions have different properties (the other article dismiss differential entropy as being of little use).--Blaisorblade 18:34, 10 September 2007 (UTC)
I am an avg CS theory student, far from an expert. I was trying to find a general definition of entropy but I could not find anything. After a bit of thinking, I realized that entropy (differential or not) of a probability distribution (equivalently, of a random variable) must be defined with respect to a base measure. For the most common, discrete, entropy the base measure is the counting measure. For "continuous" probability distributions the entropy is computed with respect to the Lebesgue measure on the real line. In principe, however, you can choose any base measure. If you change the base measure, the entropy changes as well. (This seems counter-intuitive, since most people think that entropy is some inherent quantity.) In this respect the article is sloppy and does not mention with respect to which base measure the integral is computed and at the same time pretends that the measure space can be arbitrary. Either stick to the Lebesgue measure on the real line or do it in full generality. —Preceding unsigned comment added by 129.97.84.19 (talk) 22:22, 29 February 2008 (UTC)
I am a practicing Statistician with Ph.D in statistics, and I think there is a need for the definitions the way the original developer developed the thought processes about problems, concepts, and solutions. Academically I have been trained in measure theory. It is not important for single mathematical definition of information, but it is more important all types of people with different background understand the problems, concepts, and solutions. My vote is to keep simple (this is after all Wikipedia, not a contest on how to make it precise with measure theory). The number of pages allowed is unlimited in internet. Please use a separate section on measure theory based definitions. In fact, by separating out this way, one will appreciate the beauty of measure theory for conceptualing problems and solving them elegantly.
It is not "unnecessary treatment */. It is very much a necessary treatment. Thanks —Preceding unsigned comment added by 198.160.96.25 (talk) 06:00, 19 May 2008 (UTC)
[edit] Error in Text
I believe there is an error in the example of a uniform(0,1/2) distribution, the integral should evaluate to log(1/2), I do not have access to tex to correct it. —The preceding unsigned comment was added by 141.149.218.145 (talk) 07:46, 30 March 2007 (UTC).
[edit] Merge proposal
This article talks about the same thing as Information entropy#Extending discrete entropy to the continuous case: differential entropy, but they give totally different (not contradictory) results and information. Since differential entropy is of little use in Computer Science (at least so I suppose), it can be seen as an out-of-topic section in Information entropy, i.e. something deserving just a mention in "See also". --Blaisorblade 18:41, 10 September 2007 (UTC)
- I think it would be too quick to call differential entropy "of little use" in information theory. There are some hints to the opposite:
- * Brand [1] suggested minimization of posterior differential entropy as a criterium for model selection.
- * Neumann [2] shows that maximization of differential entropy under a constraint on expected model entropy is equivalent to maximization of relative entropy with a particular reference measure. That reference measure satisfies the demand from Jaynes [3] that (up to a multiplicative factor) the reference measure also had to be the "total ignorance" prior.
- * Last not least, in physics there are frequently returning claims that differential entropy is useful, or in some settings even more powerful than relative entropy (e.g. Garbaczewski [4]). It would seem strange to me if such things do not find their counterparts in probability / information theory.
- Webtier (talk) 12:52, 20 December 2007 (UTC)
- [1] Matthew Brand: "Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction". Neural Comp. vol. 11 (1999), pp. 1155-1182. Preprint: http://citeseer.ist.psu.edu/247075.html
- [2] Tilman Neumann: “Bayesian Inference Featuring Entropic Priors”, in Proceedings of 27th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, American Institute of Physics, vol. 954 (2007), pp. 283–292. Preprint: http://www.tilman-neumann.de/docs/BIEP.pdf
- [3] Edwin T. Jaynes, "Probability Theory: The Logic of Science", Corrected reprint, Cambridge University Press 2004, p. 377.
- [4] Piotr Garbaczewski: "Differential Entropy and Dynamics of Uncertainty". Journal of Statistical Physics, vol. 123 (2006), no. 2, pp. 315-355. Preprint: http://arxiv.org/abs/quant-ph/0408192v3
[edit] WikiProject class rating
This article was automatically assessed because at least one WikiProject had rated the article as start, and the rating on other projects was brought up to start class. BetacommandBot 09:48, 10 November 2007 (UTC)
[edit] Continuous mutual information
Note that the continuous mutual information I(X;Y) has the distinction of retaining its fundamental significance as a measure of discrete information since it is actually the limit of the discrete mutual information of partitions of X and Y as these partitions become finer and finer. Thus it is invariant under quite general transformations of X and Y, and still represents the amount of discrete information that can be transmitted over a channel that admits a continuous space of values.
I actually added this statement some time ago to the article (anonymously under my IP address), and it was recently marked as needing a citation or reference. I made the statement a little weaker by changing "quite general transformations" to "linear transformations", and added Reza's book as a reference. However, it is still true that I(X;Y) is invariant under any bijective, continuous (and thus monotonic) transformations of the continuous spaces X and Y. This fact needs a reference, though. Deepmath (talk) 06:57, 12 March 2008 (UTC)