Talk:Maximum likelihood

From Wikipedia, the free encyclopedia

Mathematics Portal

This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.

Mathematics rating:

B Class

High Priority

Field: Probability and statistics

One of the 500 most frequently viewed mathematics articles.

Please update this rating as the article progresses, or if the rating is inaccurate. Click to show/hide comments.
Please add to or update the comments to suggest improvements to the article.
The lead is a bit too long. Geometry guy 00:42, 23 June 2007 (UTC)

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

1 Untitled
2 removal
3 Style
4 Difference between likelihood function and probability density function
5 what is MLE used for?
6 Thank You!!
7 Sloppiness
8 Is this true
9 Some more applications please
10 Non-independent variables
11 Mathematical precision

[edit] Untitled

Great article! Very, very clear. 18.243.6.178 04:08, 13 November 2006 (UTC)

[edit] removal

I removed this from the article, until it can be made more NPOV and more encyclopedic. Currently reads more like a list of observations than true encyclopedic content and needs more explanation. --Lexor|Talk 07:25, 5 Aug 2004 (UTC)

Maximum likelihood is one of the main methods used by frequentist (i.e. non-Bayesian) statisticians. Bayesian arguments against the ML and other point estimation methods are that

all the information contained in the data is in the likelihood function, so why use just the maximum?Bayesian methods use ALL of the likelihood function and this is why they are optimal.

ML methods have good asymptotic properties (consistency and attainment of the Cramer-Rao lower bound) but there is nothing to recommend them for analysis of small samples

the method doesn't work so well with distributions that have many modes or unusual shapes. Apart from the practical difficulties of getting stuck in local modes, there is the difficulty of interpreting the output, which consists of a point estimate plus standard error. Suppose you have a distribution for a quantity that can only take positive values, and your ML estimate for the mean comes out at 1.0 with a standard error of 3? Bayesian methods gives you the entire posterior distribution as output, so you can make sense of it and then decide what summaries are appropriate.

[edit] Style

This whole article reads like an 80's highschool textbook. As a matter of fact, a lot wikipedia's articles on difficult to understand subjects read like they're out of an 80's highschool textbook, making them only useful to people who have already know the subject back to front, making the entire wikipedia project a failure. —Preceding unsigned comment added by 4.232.174.170 (talk • contribs)

Clarity issues in a statistics article (a subject that is less than clear) should not be used to make the inference that "the entire wikipedia project a failure" Rschulz 23:56, 1 Mar 2005 (UTC)

This article was never intended to be accessible to secondary school students generally (although certainly there are those among them who would understand it). I would consider this article successful if mathematicians who know probability theory but do not know statistics can understand it. And I think by that standard it is fairly successful, although more examples and more theory could certainly be added. If someone can make this particular topic comprehensible to most people who know high-school mathematics, through first-year calculus or perhaps through the prerequisites to first-year calculus, I would consider that a substantially greater achievement. But that would tak more work. Michael Hardy 00:40, 10 Jan 2005 (UTC)

I would consider this article a failure if only "mathematicians who know probability theory" can understand it. I got to the article via a link from Constellation diagram, and learned nothing from it that I didn't already know from the other article—not even what it is used for. 121a0012 05:41, 14 June 2006 (UTC)

I think there is kind of a style struggle in mathematics whereas some people prefer math text to say what it means and discuss itself introspectively, and others prefer a terse no nonsense style. I find the discussion based approach to be healthier. One of the problems though is it isnt neccessarily encyclopedic to use that tone. I mean most important math can be stated in one or two sentence. Doesnt mean that the sentence will be approachable. But it will be factually complete saying all there is to say about the subject. So then an encyclopedia editor struggles with the fact that you have to say more to make it approachable, but at the same time you can say less and say all there is to say. It is a problem unique to mathematics. Jeremiahrounds 13:01, 20 June 2007 (UTC)

But most mathematicians who know probability theory do not know this material, so it should not be considered a failure if they understand it. That is not to say it should not be made accessible to a broader audience. But that will take more work, so be patient. Michael Hardy 17:42, 14 June 2006 (UTC)

I totally agree with the original complaint that this along with many other wikipedia math articles are too heavy going. For that reason I replaced the first paragraph with something more digestible. I see no reason to dive into using math symbols right in the first paragraph. --Julian Brown 02:50, 30 August 2007 (UTC)

As a university student learning statistics, I think this article needs improvement. It would be good if a graph of the likelihood of different parameter values for p was added (with the maximum pointed out) to the example. This addition would require adding some specific data to the example. Also, the example should be separated from the discussion about MLE, to make sure people understand that the binominal distribution is only used for this case. The reasons why it is good to take the log of likelihood are not discussed. Further the discussion about what makes a good estimator (and how MLE is related to other estimators) could be expanded. Rschulz 23:56, 1 Mar 2005 (UTC)

The "left as an exercise to the reader" part is definitely gratuitous and needs to go. I came to this page to learn about the subject, not for homework problems.

a user in CA Good article, don't be so hard on the author(s), of course could be better, but most of us have day jobs, but I would change the notation, as this was confusing " The value (lower-case) x/n observed in a particular case is an estimate; the random variable (Capital) X/n is an estimator." seems to conflict with the excellent example at the end for finding maximum likelihood x/n in a bionomial distribution of x voters in a sample of n (without replacement). Now, next question, can anybody explain the Viterbi algorithm to a high-schooler? 01 March 2004

I don't see the conflict. The lower-case x in the example at the end is not a random variable, but an observed realization. Michael Hardy 22:50, 2 Mar 2005 (UTC)

[edit] Difference between likelihood function and probability density function

I guess the line "we may compute the probability associated with our observed data" is not correct. Because the probability of a continuous variable for any given point is always zero. The correct statement would be "we may compute the likelihood associated with our observed data". For more argument please see [1]

Your assertion is correct, but your header is not. That is NOT the difference between the likelihood function and the density function. One is a function of the parameter, with the data fixed; the other is a function of the data with the parameter fixed. Michael Hardy 02:16, 13 May 2006 (UTC)

where is the spanish version?

Is it possible to get the text shown as it should read by people who don't know latex code instead of symbols such as x_1 for subindexes or x^1 for superindexes, etc. I understand those but not other symbols which are used in this article, and in any case they hamper reading. Thanks! Xinelo 14:47, 21 September 2006 (UTC)

Maybe it was a temporary problem; they look OK to me now. Michael Hardy 15:06, 21 September 2006 (UTC)

[edit] what is MLE used for?

This article is useless. Great, now I understand all the theoretical underpinnings of MLE; what the hell is it *used* for?

Thank you for your comments. Please be aware that constructive and polite criticism is more likely help you to achieve changes on wikipedia. In answer to your question I quote form the article: Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution of a given data set. If this is not clear then perhaps you could help editors by describing why it is not clear and telling them what you would like to see in addition (or perhaps in its place). reetep 11:25, 20 October 2006 (UTC)

The article is completely explicit about what it's used for. With many articles that is not the case; with this one it could not be more clear. Some people delight in complaining bitterly to their benefactors, I guess. Michael Hardy 14:29, 20 June 2007 (UTC)

[edit] Thank You!!

This article is fantastic. It is more understandable than my class notes and has helped me greatly for my class. I also really appreciated the use of examples. Many many thanks to the people who wrote it!. Poyan 8:09, 6 December 2006 —The preceding unsigned comment was added by 128.100.36.147 (talk) 13:09, 6 December 2006

[edit] Sloppiness

You really sould say something about the second derivative test. You nonchalantly claimed you reached a maximum. You could very well have found a minimum. Don't reinforce bad habits. This is especially true for the Normal case. Just because the gradient is zero does not mean you have a local maximum...and it's not especially trivial to just brush off.(ZioX 18:06, 19 March 2007 (UTC))

(UTC).

I don't know which "you" is addressed, but I agree with the spirit of the comment. But the details of the comment fall short. The second-derivative test may prove a local maximum, but here we need a global maximum. There are various ways to prove there is a global maximum, not all of them involving second derivatives. For example, suppose you show that L(θ) increases as θ goes from 0 to 3, and decreases as θ goes from 3 to ∞, and the parameter space is the interval from 0 to ∞. Then you've got a global maximum at 3, without benefit of second derivatives. Or suppose you've shown that L(θ) is differentiable everywhere, and because the parameter space is compact, there must be a global maximum somewhere, and furthermore L(θ) is 0 on the boundary and positive in the interior. Then the global maximum must be reached at a critical point in the interior. If next it turns out that there is only one critical point in the interior, then you've got it again, and again without second derivatives. This is not at all an unusual situation in elementary MLE problems. Michael Hardy 18:30, 19 March 2007 (UTC)

[edit] Is this true

From the bias section: "we can only be certain that it is greater than or equal to the drawn ticket number." Is this true? Wouldn't it be the less than or equal to?--Vince ^|Talk| 05:00, 15 May 2007 (UTC)

disregard: It was the wording that confused me. The article Bias of an estimator phrases the problem in the manner I was thinking. I may try to make it clearer. --Vince ^|Talk| 06:37, 15 May 2007 (UTC)

[edit] Some more applications please

Hi all, The article is very well written from a statistics POV.

MLE is used ubiquitously in phylogenetic analysis and cladistics in genetics and evolutionary biology. It would be great if someone could include a section on how MLE can actually be applied to such studies, with some examples.

Also, I, as an amateur biologist, know where MLE is used, but do not have an intuitive understanding of the technique. A section that would provide the layman with such a perspective (of whats actually happening) would be great...

Indiaman1 19:30, 30 June 2007 (UTC)indiaman1

[edit] Non-independent variables

I have added a section on non-independent variables. I hope this proves useful to someone :) Velocidex (talk) 04:44, 19 March 2008 (UTC)

I added a tie back to article topic by mentioning the likelihood function, which should possibly be the main thing being specifically discussed rather than the density function. Melcombe (talk) 10:45, 19 March 2008 (UTC)

For generality this section really needs something said about mixed discrete-continuous distributions. Melcombe (talk) 10:45, 19 March 2008 (UTC)

[edit] Mathematical precision

Someone wrote above that this article reads (read) like a 1980's textbook. In my opinion that was the case. For instance, the discussion of the asymptotic properties of maximum likelihood estimation could have been taken straight out of many standard textbooks, but an intelligent person can realise that the authors of those books either don't know what they are talking about or are hiding things from the reader. I added a sentence referring to modern mathematical results on the maximum likelihood estimator (modern: these results have been known since the 60's but still did not permeate into standard textbooks). I hope the result still makes sense to the non-expert. Gill110951 (talk) 07:12, 5 May 2008 (UTC)