Talk:Mutual information

From Wikipedia, the free encyclopedia

WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.
Mathematics rating: B Class Mid Priority  Field: Probability and statistics
Please update this rating as the article progresses, or if the rating is inaccurate. Please also add comments to suggest improvements to the article.

Contents

[edit] Unit of information?

instead >> It should be noted that these definitions are ambiguous because the base of the log function is not specified. To disambiguate, the function I could be parameterized as I(X,Y,b) where b is the base. Alternatively, since the most common unit of measurement of mutual information is the bit, a base of 2 could be specified.

how about The unit of information depends on the base of the log function. Most common are bases of 2, e, or 10, resulting in units of bits, nats and digits, respectively.

Internetexploder 08:15, 29 April 2007 (UTC)


I didn't initiate the notice, but the guidelines state that this notice is internal to Wikipedia and are not really for the casual reader's consumption. Any attention that a qualified contributor can give is welcome. Ancheta Wis 23:55, 23 Oct 2004 (UTC)

Noting Category:Pages needing attention, I would say that, while someone may have thought that a good guideline, it is de facto incorrect (and not policy). I, for one, do not agree with that guideline, because it hides the fact that the article needs attention from all those who can edit it and it disclaims to newbies that we know the article isn't as good as it could be. — 131.230.133.185 5 July 2005 19:23 (UTC)

[This article is] poorly explained. --Eequor 03:39, 22 Aug 2004 (UTC)

[edit] Simplify eq?

why not just say:

 I(X,Y) = \sum_{x,y} p(x,y) \times \log_2 \frac{p(x,y)}{p(x)\,p(y)}. \!

instead of all the confusing talk about what f and g are? Please elaborate if there is a specific reason why it is done this way. -- BAxelrod 02:08, 19 October 2005 (UTC)

The definitions given in the article are correct. They just happen to be highly formal. Less formal definitions are given in the article on information theory (recently added by me, but I called it transinformation). Whether this level of formality is appropriate for this article is a matter for debate. I tend to think not, because in general, someone who is working at that level of formality is not going to be looking in Wikipedia for a definition, but on the other hand, it "simplifies" matters because then one definition suffices for both the discrete and continuous cases. (i.e. integration over the counting measure is simply ordinary discrete summation.) -- 130.94.162.64 22:53, 2 December 2005 (UTC)
O.K. Simplified the formula. -- 130.94.162.64 05:24, 3 December 2005 (UTC)
Another note:  I(X,Y)\, is incorrect.  I(X;Y)\, is the accepted usage. Use a semicolon. -- 130.94.162.64 11:35, 4 December 2005 (UTC)

[edit] Mutual information between m random variables

How about adding the mutual information among multiple scalar random variables:

I(y_1;\ldots; y_m)=\sum^m_{i=1}H(y_i)-H(\mathbf{y})

(In reply to unsigned comment above:) Apparently there isn't a single well-defined mutual information for three or more random variables. It is sometimes defined recursively:
I(Y_1; Y_2) = H(Y_1) - H(Y_1 | Y_2),\,
I(Y_1; \ldots ; Y_m) =  I(Y_1; \ldots ; Y_{m-1}) - I(Y_1; \ldots ; Y_{m-1} | Y_m),\, m\geq 3 ,
where I(Y_1; \ldots ; Y_{m-1} | Y_m)= \mathbb E _{Y_m}\{I((Y_1|y_m); \ldots;(Y_{m-1}|y_m))\}.
This definition fits more along the lines of the interpretation of the mutual information as the measure of an intersection of sets, but it can become negative as well as positive for three or more random variables (in contrast to the definition in the comment above, which is always non-negative).
--130.94.162.64 23:15, 19 May 2006 (UTC)

[edit] Source

The formula is from Shannon (1948). This should be written.
Who coined the term "mutual information"? --Henri de Solages 18:41, 7 November 2005 (UTC)

[edit] Remove irrelevant reference?

The first reference, Cilibrasi and Vitanyi (2005), contains only two mentions of mutual information:

"Another recent offshoot based on our work is hierarchical clustering based on mutual information, [23]."

"[23] A. Kraskov, H. St¨ogbauer, R.G. Adrsejak, P. Grassberger, Hierarchical clustering based on mutual information, 2003, http://arxiv.org/abs/qbio/0311039"

I suggest this reference be removed as it's not helpful.

--84.9.75.186 10:57, 3 September 2007 (UTC)

The Kraskov & Stögbauer paper is an interesting one. Is that the one you are referring to? —Dfass 11:25, 3 September 2007 (UTC)