Talk:Maximum likelihood
From Wikipedia, the free encyclopedia
Great article! Very, very clear. 18.243.6.178 04:08, 13 November 2006 (UTC)
I removed this from the article, until it can be made more NPOV and more encyclopedic. Currently reads more like a list of observations than true encyclopedic content and needs more explanation. --Lexor|Talk 07:25, 5 Aug 2004 (UTC)
- Maximum likelihood is one of the main methods used by frequentist (i.e. non-Bayesian) statisticians. Bayesian arguments against the ML and other point estimation methods are that
-
- all the information contained in the data is in the likelihood function, so why use just the maximum?Bayesian methods use ALL of the likelihood function and this is why they are optimal.
-
- ML methods have good asymptotic properties (consistency and attainment of the Cramer-Rao lower bound) but there is nothing to recommend them for analysis of small samples
-
- the method doesn't work so well with distributions that have many modes or unusual shapes. Apart from the practical difficulties of getting stuck in local modes, there is the difficulty of interpreting the output, which consists of a point estimate plus standard error. Suppose you have a distribution for a quantity that can only take positive values, and your ML estimate for the mean comes out at 1.0 with a standard error of 3? Bayesian methods gives you the entire posterior distribution as output, so you can make sense of it and then decide what summaries are appropriate.
This whole article reads like an 80's highschool textbook. As a matter of fact, a lot wikipedia's articles on difficult to understand subjects read like they're out of an 80's highschool textbook, making them only useful to people who have already know the subject back to front, making the entire wikipedia project a failure.
- Clarity issues in a statistics article (a subject that is less than clear) should not be used to make the inference that "the entire wikipedia project a failure" Rschulz 23:56, 1 Mar 2005 (UTC)
- This article was never intended to be accessible to secondary school students generally (although certainly there are those among them who would understand it). I would consider this article successful if mathematicians who know probability theory but do not know statistics can understand it. And I think by that standard it is fairly successful, although more examples and more theory could certainly be added. If someone can make this particular topic comprehensible to most people who know high-school mathematics, through first-year calculus or perhaps through the prerequisites to first-year calculus, I would consider that a substantially greater achievement. But that would tak more work. Michael Hardy 00:40, 10 Jan 2005 (UTC)
-
-
- I would consider this article a failure if only "mathematicians who know probability theory" can understand it. I got to the article via a link from Constellation diagram, and learned nothing from it that I didn't already know from the other article—not even what it is used for. 121a0012 05:41, 14 June 2006 (UTC)
-
-
-
-
-
- But most mathematicians who know probability theory do not know this material, so it should not be considered a failure if they understand it. That is not to say it should not be made accessible to a broader audience. But that will take more work, so be patient. Michael Hardy 17:42, 14 June 2006 (UTC)
-
-
-
-
- As a university student learning statistics, I think this article needs improvement. It would be good if a graph of the likelihood of different parameter values for p was added (with the maximum pointed out) to the example. This addition would require adding some specific data to the example. Also, the example should be separated from the discussion about MLE, to make sure people understand that the binominal distribution is only used for this case. The reasons why it is good to take the log of likelihood are not discussed. Further the discussion about what makes a good estimator (and how MLE is related to other estimators) could be expanded. Rschulz 23:56, 1 Mar 2005 (UTC)
-
- The "left as an exercise to the reader" part is definitely gratuitous and needs to go. I came to this page to learn about the subject, not for homework problems.
- a user in CA Good article, don't be so hard on the author(s), of course could be better, but most of us have day jobs, but I would change the notation, as this was confusing " The value (lower-case) x/n observed in a particular case is an estimate; the random variable (Capital) X/n is an estimator." seems to conflict with the excellent example at the end for finding maximum likelihood x/n in a bionomial distribution of x voters in a sample of n (without replacement). Now, next question, can anybody explain the Viterbi algorithm to a high-schooler? 01 March 2004
- I don't see the conflict. The lower-case x in the example at the end is not a random variable, but an observed realization. Michael Hardy 22:50, 2 Mar 2005 (UTC)
Contents |
[edit] Difference between likelihood function and probability density function
I guess the line "we may compute the probability associated with our observed data" is not correct. Because the probability of a continuous variable for any given point is always zero. The correct statement would be "we may compute the likelihood associated with our observed data". For more argument please see [1]
- Your assertion is correct, but your header is not. That is NOT the difference between the likelihood function and the density function. One is a function of the parameter, with the data fixed; the other is a function of the data with the parameter fixed. Michael Hardy 02:16, 13 May 2006 (UTC)
where is the spanish version?
Is it possible to get the text shown as it should read by people who don't know latex code instead of symbols such as x_1 for subindexes or x^1 for superindexes, etc. I understand those but not other symbols which are used in this article, and in any case they hamper reading. Thanks! Xinelo 14:47, 21 September 2006 (UTC)
- Maybe it was a temporary problem; they look OK to me now. Michael Hardy 15:06, 21 September 2006 (UTC)
[edit] what is MLE used for?
This article is useless. Great, now I understand all the theoretical underpinnings of MLE; what the hell is it *used* for?
- Thank you for your comments. Please be aware that constructive and polite criticism is more likely help you to achieve changes on wikipedia. In answer to your question I quote form the article: Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution of a given data set. If this is not clear then perhaps you could help editors by describing why it is not clear and telling them what you would like to see in addition (or perhaps in its place). reetep 11:25, 20 October 2006 (UTC)
[edit] Thank You!!
This article is fantastic. It is more understandable than my class notes and has helped me greatly for my class. I also really appreciated the use of examples. Many many thanks to the people who wrote it!. Poyan 8:09, 6 December 2006 —The preceding unsigned comment was added by 128.100.36.147 (talk) 13:09, 6 December 2006
[edit] Sloppiness
You really sould say something about the second derivative test. You nonchalantly claimed you reached a maximum. You could very well have found a minimum. Don't reinforce bad habits. This is especially true for the Normal case. Just because the gradient is zero does not mean you have a local maximum...and it's not especially trivial to just brush off.(ZioX 18:06, 19 March 2007 (UTC))
(UTC).
- I don't know which "you" is addressed, but I agree with the spirit of the comment. But the details of the comment fall short. The second-derivative test may prove a local maximum, but here we need a global maximum. There are various ways to prove there is a global maximum, not all of them involving second derivatives. For example, suppose you show that L(θ) increases as θ goes from 0 to 3, and decreases as θ goes from 3 to ∞, and the parameter space is the interval from 0 to ∞. Then you've got a global maximum at 3, without benefit of second derivatives. Or suppose you've shown that L(θ) is differentiable everywhere, and because the parameter space is compact, there must be a global maximum somewhere, and furthermore L(θ) is 0 on the boundary and positive in the interior. Then the global maximum must be reached at a critical point in the interior. If next it turns out that there is only one critical point in the interior, then you've got it again, and again without second derivatives. This is not at all an unusual situation in elementary MLE problems. Michael Hardy 18:30, 19 March 2007 (UTC)