Talk:Information entropy

From Wikipedia, the free encyclopedia

WikiProject Physics This article is within the scope of WikiProject Physics, which collaborates on articles related to physics.
??? This article has not yet received a rating on the assessment scale. [FAQ]
??? This article has not yet received an importance rating within physics.

Please rate this article, and then leave comments here to explain the ratings and/or to identify the strengths and weaknesses of the article.

Contents

[edit] Archives


[edit] log probability

The article Perplexity says that information entropy is "also called" log probability. Is it true that they're the same thing? If so, a mention or brief discussion of this in the article might be appropriate. dbtfztalk 01:34, 20 April 2006 (UTC)

[edit] Units and the Continuous Case

The extension to the continuous case has a subtle problem: the distribution f(x) has units of inverse length and the integral contains "log f(x)" in it. Logarithms should be taken on dimensionless quantities (quantities without units). Thus, the logarithm should be of the ratio of f(x) to some characteristic length L. Something like log [ f(x) / L ] would be more proper.

The problem with taking a transcendental function of a quantity with units arises from the way we define arithmetic operations for quantities with units. 5 m + 2 m is defined (5 m + 2 m = 7 m) but 5 m + 2 kg is not defined because the units are different among the quantities to be added. Transcendental functions (such as logarithms), of a variable x with units, present problems for determining the resulting units of the results of the functions of x. This is why scientists and engineers try to form ratios of quantities in which all the units cancel, and then apply transcendental functions to these ratios rather than the original quantities. As an example, in exp[-E/(kT)] the constant k has the proper units for canceling the units of energy E and temperature T so units cancel in the quantity E/(kT). Then the result of the operation, of a typical transcendental function on its dimensionless argument, is also dimensionless.

My suggested solution to the problem with the units raises another question: what choice of length L should be used in the expression log [ f(x) / L ]? I think any choice can work. —The preceding unsigned comment was added by 75.85.88.234 (talk) 18:06, 17 December 2006 (UTC).

For canceling the inverse unit of length (actually the inverse unit of x), there should appear a product of f(x) and a length L under the logarithm, i.e. log [ f(x) L ]. This would be, indeed, bizare, as any length L would work - unless we are in the frame of quantum mechanics. In that case, we would simply use the smallest quantumly distinguishable value for L. If x is truly a length, then L could be Planck's length. But this is already too obfuscating for me. I would rather recommend on concentrating on the discrete formula of entropy: S = Sum [ p(i) log p(i) ]. Now, in the continuous case, the probability is infinitesimal an it is dP = f(x) dx. Thus, the exact transcription of the above formula with this probability would give S = Sum [ f(x) dx log ( f(x) dx ) ]. Now Sum would become Integral and log ( f(x) dx ) is a functional which must take a form of L(x) dx. The worst problem now is that there are two dx under one integral. This problem appears in the above modified formula for S. This problem must be worked out somehow. Its source is in the product in the initial Shannon entropy.
If you want to work with continuous variables, you're on much stronger ground if you work with the relative entropy, ie the Kullback-Leibler distance from some prior distribution, rather than the Shannon entropy. This avoids all the problems of the infinities and the physical dimensionality; and often, when you think it through, you'll find that it may make a lot more sense philosophically in the context of your application, too. Jheald 19:32, 7 February 2007 (UTC)
Of course, the relative entropy is very good for the continuous case, but, unlike Shannon entropy, it is relative, as it needs a second distribution from which to depart. I was thinking of a formula that would give a good absolute entropy, similar to the Shannon entropy, for the continuous case. This is purely speculative, though. —The preceding unsigned comment was added by 193.254.231.71 (talk) 13:52, 8 February 2007 (UTC).

[edit] Extending discrete entropy to the continuous case: differential entropy

Q —The preceding unsigned comment was added by 193.254.231.71 (talk) 10:18, 12 February 2007 (UTC).

The last definition of the differential entropy (second last formula) seems to malfunction. Actually, it should read

h[f] = lim (Delta -> 0) [ H^Delta + log Delta * Sum [ f(xi) Delta ] ]

This would ensure the complete canceling of the second sum in H^Delta. With the current formula, there would remain a non-canceling term:

h[f] = lim (Delta -> 0) [ H^Delta + log Delta ] = Integral[ f(x) log f(x) dx ] - -lim (Delta -> 0) [ log Delta * ( Sum [ f(xi) Delta ] -1 ) ] .

The last limit does not go to zero. Actually, through a l'Hopital applied to (1-Sum) / (1/log Delta) , it would go to

- lim (Delta -> 0) [ Delta (log Delta)^2 Sum[f(xi)] ],

and, as Delta -> 0, Sum[f(xi)] -> infinity as 1/Delta (since Sum[f(xi) Delta] -> 1), so it would cancel the first Delta in the limit above, and there would be only

- lim (Delta -> 0) [ (log Delta)^2 ] -> - infinity

Thus, the last definition of h[f] could not even be used. I recommend checking with a reliable source on this, then, maybe, if that formula is wrong, its erasure. Misfortunately, I have no knowledge of the way formulas are written in wikipedia (yet).

[edit] Roulette Example

In the roulette example, the entropy of a combination of numbers hit over P spins is defined as Omega/T, but the entropy is given as lg(Omega), which then calculates to the Shannon definition. Why is lg(Omega) used? (Note: I'm using the notation "lg" to denote "log base 2") 66.151.13.191 20:41, 31 March 2006 (UTC)

[edit] moved to talk page because wikipedia is not a textbook

[edit] Derivation of Shannon's entropy

Since the entropy was given as a definition, it does not need to be derived. On the other hand, a "derivation" can be given which gives a sense of the motivation for the definition as well as the link to thermodynamic entropy.

Q. Given a roulette with n pockets which are all equally likely to be landed on by the ball, what is the probability of obtaining a distribution (A1, A2, …, An) where Ai is the number of times pocket i was landed on and

P = \sum_{i=1}^n A_i \,\!

is the total number of ball-landing events?

A. The probability is a multinomial distribution, viz.

p = {\Omega \over \Tau} = {P! \over A_1! \ A_2! \ A_3! \ \cdots \ A_n!} \left(\frac1n\right)^P \,\!

where

\Omega = {P! \over A_1! \ A_2! \ A_3! \ \cdots \ A_n!} \,\!

is the number of possible combinations of outcomes (for the events) which fit the given distribution, and

\Tau = n^P \

is the number of all possible combinations of outcomes for the set of P events.

Q. And what is the entropy?

A. The entropy of the distribution is obtained from the logarithm of Ω:

H = \log \Omega = \log \frac{P!}{A_1! \ A_2! \ A_3! \cdots \ A_n!} \,\!
= \log P! - \log A_1! - \log A_2! - \log A_3! - \cdots - \log A_n! \
= \sum_i^P \log i - \sum_i^{A_1} \log i - \sum_i^{A_2} \log i - \cdots - \sum_i^{A_n} \log i \,\!

The summations can be approximated closely by being replaced with integrals:

H = \int_1^P \log x \, dx - \int_1^{A_1} \log x \, dx - \int_1^{A_2} \log x \, dx - \cdots - \int_1^{A_n} \log x \, dx. \,\!

The integral of the logarithm is

\int \log x \, dx = x \log x - \int x \, {dx \over x} = x \log x - x. \,\!

So the entropy is

H = (P \log P - P + 1) - (A_1 \log A_1 - A_1 + 1) - (A_2 \log A_2 - A_2 + 1) - \cdots - (A_n \log A_n - A_n + 1)
= (P \log P + 1) - (A_1 \log A_1 + 1) - (A_2 \log A_2 + 1) - \cdots - (A_n  \log A_n + 1)
= P \log P - \sum_{x=1}^n A_x \log A_x + (1 - n) \,\!

By letting px = Ax/P and doing some simple algebra we obtain:

H = (1 - n) - \sum_{x=1}^n p_x \log p_x \,\!

and the term (1 − n) can be dropped since it is a constant, independent of the px distribution. The result is

H = - \sum_{x=1}^n p_x \log p_x \,\!.

Thus, the Shannon entropy is a consequence of the equation

H = \log \Omega \

which relates to Boltzmann's definition,

\mathcal{S} = k \ln \Omega,

of thermodynamic entropy, where k is the Boltzmann constant.

[edit] H(X), H(Ω), and the word 'outcome'

Recent edits to this page now stress the word "outcome" in the opening sentence:

information entropy is a measure of the average information content associated with the outcome [emphasised] of a random variable.

and have changed formulas like

H(X)=-\sum_{i=1}^np(x_i)\log_2 p(x_i),\,\!

to

H(X) = -\sum_{\omega \in \Omega}p(\omega)\log_2 p(\omega)

There appears to have been a confusion between two meanings of the word "outcome". Previously, the word was being used on these pages in a loose, informal, everyday sense to mean "the range of the random variable X" -- ie the set of values {x1, x2, x3 ...) that might be revealed for X.

But "outcome" also has a technical meaning in probability, meaning the possible states of the universe {ω1, ω2, ω3 ...), which are then mapped down onto the states {x1, x2, x3 ...) by the random variable X (considered to be a function mapping Ω -> R).

It is important the mapping X may in general be many-to-one: so H(X) and H(Ω) are not in general the same. In fact we can say definitely that H(X) <= H(Ω), with equality holding only if the mapping is one-to-one over all subsets of Ω with non-zero measure. (the "data processing theorem").

The correct equations are therefore

H(X)=-\sum_{i=1}^np(x_i)\log_2 p(x_i),\,\!

or

H(\Omega) = -\sum_{\omega \in \Omega}p(\omega)\log_2 p(\omega)

But in general the two are not the same. -- Jheald 11:37, 4 March 2007 (UTC).

[edit] Sorry, I don't get it

Self-information of an event is a number, right? Not a random variable. Yes?

So how can entropy be the expectation of self-information? I sort-of understand what the formula is coming from, but it doesn't look theoretically sound... Thanks. 83.67.217.254 13:19, 4 March 2007 (UTC)

Ok, maybe I understand. I(omega) is a number, but I(X) is itself a random variable. I have fixed the formula. 83.67.217.254 13:27, 4 March 2007 (UTC)

Uh-oh, what have I done? "Failed to parse (Missing texvc executable; please see math/README to configure.)" Could you please fix? Thank you. 83.67.217.254 13:30, 4 March 2007 (UTC)