Talk:Fisher information

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.


This article may be too technical for a general audience.
Please help improve this article by providing more context and better explanations of technical details to make it more accessible, without removing technical details.

The first line in 'Example' miss a left-hand parenthesis ")". Thank you for a nice article!

There seem to be some superfluous brackets in the expectation notation: Neither \mathbb E X ^2 \, nor \left[\mathbb E X\right]^2\, is ambiguous, and in fact their difference should be the variance of X if I remember right. \mathbb E X ^2 \, does not need to be disambiguated according to standard order of operations. -- 130.94.162.61 04:45, 10 February 2006 (UTC)
Also there are some philosophical issues not discussed in the article. Without some prior probability distribution on θ, how can we ever hope to extract information about it? For example, take a person's height. We usually start with some a priori idea or expectation of what a person's height ought to be before taking any measurements at all. If we measure a person's height to be 13 feet, we would normally assume the measurement was wrong and probably discard it (as a so-called "outlier"). But if more and more measurements gave a result in the vicinity of 13 feet, it might dawn on us that we are measuring a giant. On the other hand, a single measurement of 5 feet 5 inches would probably convince us of someone else's height to a reasonable degree of accuracy. Fisher information doesn't say anything about a priori probability distributions on θ. A maximum likelihood estimator which assumes a "uniform" distribution over all the reals (w.r.t. the Lebesgue measure) is an absurdity. I'm not sure I'm making any sense (and feel free to delete this comment if I'm not), but I don't believe any information can be extracted about an unknown parameter without having beforehand some rough estimate of the a priori probability distribution of that parameter. -- 130.94.162.61 13:54, 10 February 2006 (UTC)

Contents

[edit] Added Regularity condition

The above comment is specious. The writer brings up a point that Fisher Information does not speak to. Fisher information assumes that one is estimating a parameter and that there is no a priori distribution of that parameter. This is one of the weaknesses of Fisher Information. However, it is not relevant to an article about Fisher information except in the context of "Other formulations." There is, however an important error in this article. The second derivative version of the definition of Fisher Information is only valid if the proper regularity condition is met. I added the condition, though this may not be the best representation of it. The formula looks rather ugly to me, but I don't have time to make it pretty. Sorry! --67.85.203.239 22:15, 12 February 2006 (UTC)

My comment above was somewhat specious, but when I carry out the differentiation of the second derivative version of the Fisher information, I get a term
 \mathbb E \left[ \frac { \frac {\partial^2} {\partial\theta^2} f(X|\theta) } {f(X|\theta)} \right] \mathrm{\ or\ } \int_X \frac {\partial^2} {\partial\theta^2} f(x|\theta)\,dx
that must be equal to zero. Is this valid for a regularity condition or at all what is wanted here? The regularity condition that was added to the article doesn't make much sense to me, since it contains a capital X and no expectation taken over it. Please excuse my ignorance. As to my comment above, I still think something belongs in the article (in the way of introduction) to tell someone like me what Fisher information is used for as well as when or why it should or shouldn't be used. As the article stands, it's just a bunch of mathematical formulae without much context or discussion. -- 130.94.162.61 22:06, 8 March 2006 (UTC)
There should be a little more discussion of the Cramér-Rao inequality, too. -- 130.94.162.61 22:31, 8 March 2006 (UTC)


But isn't it generally going to be the case (assuming the 2nd derivative exists)
\int \frac{\partial^2}{\partial \theta^2}f(X ; \theta ) \, dx = \frac{\partial^2}{\partial \theta^2} \int f(X ; \theta ) \, dx = \frac{\partial^2}{\partial \theta^2} 1 = 0
71.221.255.155 07:35, 8 December 2006 (UTC)


[edit] Some things unclear(/wrong?)

In the expression

\int \frac{\partial^2}{\partial \theta^2}f(X ; \theta ) \, dx  = 0,

might it be f(x;θ)?

Also, it is unclear whether the θ's must cover the whole parameter space, or could cover some subspace. In discussing the N-variate gaussian, it is said that the information matrix has indeces running from 1 to N, but there are (N + 1)(N + 2) / 2 parameters to describe a gaussian. This is probably a mistake. PhysPhD

[edit] Say more about Roy Frieden's work

I should admit that I have studied mathematical statistics. Even so, by Wiki standards, this entry is not unduly technical. I've added some links (and am sure more could be added) that should help the novice reader along. The first person to contribute to this talk page is an unwitting Bayesian, when (s)he calls for a "prior distribution" on θ. Information measures and entropy are bridges connecting classical and Bayesian statistics. This entry should sketch bits of those bridges, if only by including a few links. This entry should say more comparing and constrasting Fisher information with the measures of Shannon, Kullback-Leibler, and possibly others.

Wiki should also say more, somewhere, about the extraordinary work of Roy Frieden. Frieden, a respectable physicist, has written a nearly 500pp book arguing that a great deal of theoretical physics can be grounded in Fisher information and the calculus of variations. This should not come as complete surprise to anyone who has mastered Hamiltonian mechanics and has thought about the principle of least action, but even so, Frieden's book is a breathtaking high wire act. It appears that classical mechanics, electromagnetism, and thermodynamics, general relativity, and quantum electrodynamics are all merely different applications of a few core information-theoretic and variational principles. Frieden (2004) also includes a chapter on what he thinks his EPI approach could contribute to unsolved problems, such as quantum gravitation, turbulence, and topics in particle physics. Could EPI even prove to be the eventual gateway to that Holy Grail of contemporary science, the unification of the three fundamental forces, electroweak, strong, and gravitation? I should grant that EPI doesn't answer everything; for example, it sheds no light on why the fundamental dimensionless constants take on the values that they do. Curiously, Frieden says little about optics even though that was his professional specialty.202.36.179.65 13:19, 11 April 2006 (UTC)

A number of links to articles about Frieden and his work are already in this article. Michael Hardy 20:31, 11 April 2006 (UTC)
The physical and mathematical correctness of Frieden's ideas have been characterized as highly dubious by several knowledgeable observers; see, for example, Ralph F. Streater's ``Lost Causes in Theoretical Physics: Physics from Fisher Information, and Cosma Shalizi's review of Physics from Fisher Information. QuispQuake 14:55, 12 July 2006 (UTC)

[edit] B. Roy Frieden's anonymous POV-pushing edits

B. Roy Frieden claims to have developed a "universal method" in physics, based upon Fisher information. He has written a book about this. Unfortunately, while Frieden's ideas initially appear interesting, his claimed method has been characterized as highly dubious by knowledgeable observers (Google for a long discussion in sci.physics.research from some years ago.)

Note that Frieden is Prof. Em. of Optical Sciences at the University of Arizona. The data.optics.arizona.edu anon has used the following IPs to make a number of questionable edits:

  1. 150.135.248.180 (talk · contribs)
    1. 20 May 2005 confesses to being Roy Frieden in real life
    2. 6 June 2006: adds cites of his papers to Extreme physical information
    3. 23 May 2006 adds uncritical description of his own work in Lagrangian and uncritically cites his own controversial book
    4. 22 October 2004 attributes uncertainty principle to Cramer-Rao inequality in Uncertainty Principle, which is potentially misleading
    5. 21 October 2004 adds uncritical mention of his controversial claim that Maxwell-Boltzmann distribution can be obtained via his "method"
    6. 21 October 2004 adds uncritical mention of his controversial claim that the Klein-Gordon equation can be "derived" via his "method"
  2. 150.135.248.126 (talk · contribs)
    1. 9 September 2004 adds uncritical description of his work to Fisher information
    2. 8 September 2004 adds uncritical description of his highly dubious claim that EPI is a general approach to physics to Physical information
    3. 16 August 2004 confesses IRL identity
    4. 13 August 2004 creates uncritical account of his work in new article, Extreme physical information

These POV-pushing edits should be modified to more accurately describe the status of Frieden's work.---CH 21:54, 16 June 2006 (UTC)

[edit] Graphs to improve technical accessibility

In addressing the technical accessibility tag above, I would recommend the addition of some graphs. For example, this concept could be related to the widely understood concept of the Gaussian bell curve. -- Beland 21:35, 4 November 2006 (UTC)

[edit] Minus sign missing?

In the one-dimensional equation, there is a minus sign in the equation linking the second derivative of the log likelihood to the variance of theta. This stands to reason, as we want maximum, not minimum likelihood, so the second derivative becomes negative. In the matrix formulation below, there is no minus sign. Should it not be there, too? In practice, of course, one often minimizes sums of squares, or other "loss" functions, instead. This already is akin to -log(L). I am not a professional statistician, but I use statistics a lot in my profession, microbiology. I did not find the article too technical. After all, the subject itself is somewhat technical. Wikipedia does a great job of making gems such as this accessible. 82.73.149.14 19:51, 30 December 2006 (UTC)Bart Meijer

[edit] Style

I think that the style in which parts of this article are written is more appropriate for a textbook than for an encyclopedia article. For example: "To informally derive the Fisher Information, we follow the approach described by Van Trees (1968) and Frieden (2004)" This type of comment is only really appropriate in a textbook where a single author or a few authors are writing a book with a coherent theme. An encyclopedia article ought to adopt a different style: in particular, I object to the use of the term "we", as on wikipedia, with so many authors and with anonymous authors, it is not clear who the word "we" refers to. Instead, I think we should word things "Van Trees (1968) and Frieden (2004) provide the following method of deriving the Fisher information informally:". I am going to rewrite this to try to eliminate these sorts of comments. But...I think this style problem goes beyond just the use of the word "we"...it's pretty pervasive and it needs deep changes. Cazort (talk) 18:14, 10 January 2008 (UTC)

[edit] Informal Derivation & Definition

This derivation doesn't seem to be a derivation of the Fisher information, but rather, a derivation of the relationship between Fisher information and the bound on the variance of an estimator. Does everyone agree with me that this should be renamed? Also, this remark relates to the definition of Fisher information. For example, the comment "The Fisher information is the amount of information" is loaded, because it is not defined what information means. I am going to weaken this statement accordingly. If we can come up with a more rigorous and more precise definition then we should include it! Cazort (talk) 18:22, 10 January 2008 (UTC)

[edit] How about putting in 'Mutual Information' and 'Joint Information' discussion

I've heard mention of "mutual information" and "joint information" (bivariate discrete random variables); shouldn't these terms be discussed?199.196.144.13 (talk) 21:08, 29 May 2008 (UTC)