Talk:Likelihood principle

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

The likelihood link ( http://www.cimat.mx/reportes/enlinea/D-99-10.html ) given at the end is 404. -- 20050120 03:15.

Hello. Recently the following statement was added -- "By contrast, a likelihood-ratio test is based on the principle." This is not clear to me -- while forming a likelihood ratio is entirely consistent with the likelihood principle, appealing to the usual logic of null-hypothesis tests (rejecting the null hypothesis if the LR is too small) is not. A LR test appears to be similar to other null-hypothesis tests in that events that didn't happen have an effect on the inference, thus it appears to be inconsistent with the likelihood principle. -- I'm inclined to remove this new assertion; perhaps someone would like to argue in its favor? Happy editing, Wile E. Heresiarch 15:05, 18 Jun 2004 (UTC)

Being a Bayesian at heart, I don't personally like standard likelihood-ratio tests any more than I like maximum likelihood as a method, but the argument might go like this: the evidence may point more to the null hypothesis or more to the alternative hypothesis. The degree to which the evidence points at one hypothesis rather than another is (thanks to the likelihood principle) expressed in the likelihood ratio. Therefore it makes sense to accept the null hypothesis if the likelihood ratio is "high enough" and to reject it if not. The value of "high enough" is a matter of choice; one approach might be to use 1 as the critical value, but for a Bayesian looking at point hypotheses the figure would best be a combination of the (inverse) ratio of the priors and the relative costs of Type I errors and Type II errors. --Henrygb 22:15, 23 Jun 2004 (UTC)

Henry, I went ahead and removed "By contrast, a likelihood-ratio test can be based on the principle." from the article. A standard LR test, as described for example in the likelihood-ratio test article, does involve unrealized events and so it is not consistent with the likelihood principle. It might be possible to construct an unconventional LR test as described above, but that's not what is generally understood by the term, so I think it's beside the point. Regards & happy editing, Wile E. Heresiarch 14:20, 4 Aug 2004 (UTC)

I think you need to consider what you are saying: "[the likelihood ratio] is the degree to which the observation x supports parameter value or hypothesis a against b. If this ratio is 1, the evidence is indifferent, and if greater or less than 1, the evidence supports a against b or vice versa." But you think that this does not provide any justification for a likelihood ratio test which in effect says: "If the likelihood ratio is less than some value κ, then we can decide to prefer b to a." I find that very odd; I suspect that in fact you object to how frequentists calculate κ but that is not the point about likelihood ratio tests in general. --Henrygb 17:33, 13 Aug 2004 (UTC)

I'm willing to consider a compromise of the form "The conventional likelihood-ratio test is not consistent with the likelihood principle, although there is an unconventional LR test which is". That would make it necessary to explain just what an unconventional LR test is, which might be worthwhile. Comments? Wile E. Heresiarch 02:17, 15 Aug 2004 (UTC)

I agree. We used an unconventional LR test in footnote 22 on page 79 of this paper (2004). We had to, because the comparison was between non-nested models (of the same data, of course). Our reference to Edwards should be to page 76 rather than to page 31. Arie ten Cate 13:40, 3 August 2005 (UTC)

I've largely rewritten the article. It still needs work, in particular, it needs some non-Bayesian argument under "Arguments in favor of the likelihood principle". I've tried to clarify the article by separating the general principle from particular applications. It could also use some links to topics of inference in general, maybe Hume, Popper, epistemology etc if we want to get wonky about it. Wile E. Heresiarch 17:51, 2 Jan 2004 (UTC)

I've added the voltmeter story as an argument in favor of the likelihood principle (and forgot to note this in the Edit summary). The story is taken from the 1976 reprint of the first edition of Likelihood. I trust it is in the second edition as well.

Although I find this argument very convincing, I am not sure if I would apply the likelihood principle in clinical trials. Maybe the whole point here is a subtle difference between two types of inferential statistics: "What can we learn from this particular experiment?" (in ordinary scientific research, where the likelihood principle should be applied) and "What would happen if we did it again?" (in commercial product research).

I tried to translate the voltmeter story to the story of Adam, Bill, and Charlotte, by assumming that Adam could have chosen the other stopping rule, with some probability. But then it loses its attractive simplicity, and I decided not to present this.

Arie ten Cate 18:23, 7 August 2005 (UTC)

The remainder of the talk page here could probably be archived under a suitable title.

In my opinion, it is not true that, if the designs produce proportional likelihood functions, one should make an identical inference about a parameter from the data irrespective of the design which generated the data (likelihood principle: LP).

The situation is usually illustrated by means of the following well-known example. Consider a sequence of independent Bernoulli trials in which there is a constant probability of success p for each trial. The observation of x successes on n trials could arise in two ways: either by taking n trials yielding x successes, or by sampling until x successes occur, which happens to require n trials. According to the LP, the distinction is irrelevant. In fact, the likelihood is proportional to the same expression in each case, and the inferences about p would be the same.

Nevertheless, this point is questionable.

In particular, following the logical approach, the probability of hypothesis h is conditional upon or relative to given evidence (cf. Carnap, 1962, p.31). Quoting Carnap's own words, "the omission of any reference to evidence is often harmless". That means that probability is conditional to that which is known. Now, apart from other information, the design d is actually known. Therefore, evidence (e) comprises not only that which is known to the statistician prior the survey is performed (e*), but also the piece of information about d. Let suppose now that i (that stands for information) is our experimental observation and h one of the competing hypotheses, we could use the premise above to correctly formulate the probability of i as follows:

(1) p(i|h, e*, d)

Notice that this probability is not defined without a reference to d. Thus, the probability of x successes on n Bernoulli trials is different whether n or x is fixed before the experiment is performed. Namely, the design always enters into the inference because of its occurrence in the probability of i.

So far so good. Note that p(i|h, e*, d) immediately simplifies to p(i|h, e*). Why? Because asserting that p(i|h, e*, d) != p(i|h, e*) is equivalent to asserting that p(d|i, h, e*) != p(d|h, e*) -- that is, knowing the experimental outcome must tell you something about the design. That's not so: I tell you that I tossed a coin 10 times and got 7 heads. Do you have any reason to believe one way or the other that I resolved to toss 10 times exactly, or to toss until getting 7 heads? No, you don't. Therefore p(i|h, e*, d) = p(i|h, e*), and the computation of p(h|i, e*) goes through as usual. Wile E. Heresiarch 17:51, 2 Jan 2004 (UTC)

The simplified manner with which Bayes formula has been and still is presented in Statistics (i.e. without specifying the evidence e) caused rather serious interpretation errors. As a matter a fact, the correct expression of Bayes' formula is of the form:

(2) p(h|i, e*, d) proportional to p(h| e*, d) p(i|h, e*, d)

in which it is apparent that the prior depends on d. Namely, in general, the prior is influenced by the knowledge available on design.

Consequently, contrary to a widely held opinion, the likelihood principle is not a direct consequence of Bayes theorem. In particular, the piece of information about the design is one part of the evidence, and, therefore, it is relevant for the prior.

REFERENCES:

CARNAP R. (1962). Logical Foundations of Probability. The University of Chicago Press.

DE CRISTOFARO R. (1992). The Inductive Reasoning in Statistical Inference, Communications in Statistics, Theory and Methods, v. 31, issue 7, pp. 1079-1089.

I agree with the point that likelihood principle is not a direct consequence of Bayes' theorem. It is, however, a consequence of the sufficiency and conditionality principles, as proved by Birnbaum. I have added a paragraph pointing out this fact and briefly describing these two principles. Bill Jefferys 22:49, 15 October 2005 (UTC)

However, I doubt your second point, that knowledge of the design is relevant for the prior. For example, in the coin-tossing example I fail to see how my prior on whether the coin is fair or not would depend on the design. That is, ISTM that in this case

p (h | e * , d) = p (h | e * )

and the design is irrelevant. I can certainly imagine that one would design an experiment such that it is likely to tell us interesting things about h, and we would certainly use our prior information about h in deciding on the design, but in my view it's obvious that even in this case the prior on h comes first, and the design comes afterwards, and in no way is the prior on h affected by the design that we choose.

So I'd like to see a concrete example where we would really think that $p(h| e*, d) \ne p(h|e*)$ .

Am I missing something here? Bill Jefferys 23:10, 15 October 2005 (UTC)

Dear Bill Jefferys, I saw in Wikipedia your comment to my remarks about likelihood principle. I suggest you to read my paper: On the Foundations of Likelihood Principle in Journal of Statistical Planning and Inference (2004) 126 (2), 401-411, and my communication to the Symposium held in June at Virginia Tech (Foundations of the 'Objective Bayesian Inference'): http://www.error06.econ.vt.edu/ (PDF of Paper). About your note, I would like to observe that The discussion about the LP is related to the example of an experiment that is designed to elicit a coin’s physical probabilities of landing heads and tails. And the conclusion is as follows: it does not matter whether the experimenter intended to stop after n tosses of the coin or after r heads appeared in a sample; the inference about ф [the probability of landing heads] is exactly the same in both cases. This conclusion is based on the assumption that the prior for ф is the same in both cases being investigated. In reality, inverse sampling (with r fixed in advance) always stops when we observe the last head. On the contrary, the last observation of direct sampling (with n fixed in advance) may be head or not. Namely, inverse sampling favours the chance of landing heads in the set of tosses. This circumstance brings into discussion the assumption of the same prior in the two different kinds of experiment. I think that this is a concrete example where p(h|e*, d) is different from p(h|e*). Of course, the likelihood principle (LP) holds that the inference should be the same in case the results of the experiments are the same. However, this thesis too appears equally dubious: the fact of considering only a particular set of results does not change the different nature of the experiments and their influence on the prior probabilities. As a futher example, the assignment of an equal probability to each face of a dice is based on the assumption that the casting of the dice is fair. In the same way, apart from other information, in order to assign the same probability to every admissible hypothesis, the design should be ‘fair’ or ‘impartial’, in the sense of ensuring the same support to all hypotheses. On the other hand, a general principle that concerns any inquiry is as follows: We can assign a uniform distribution over a partition of hypotheses where there is no reason to believe one more likely to be true than any other, in the sense of both irrelevance of prior information and impartiality of the method of inquiry. I do not see why the design (or the method of inquiry) should not be relevant in Statistics. In reality, it is relevant in all fields of the research work. Best regards, Rodolfo de Cristofaro See my home page in www.ds.unifi.it

I fail to see why the fact that inverse binomial sampling always ends in a "heads" whereas binomial sampling may end in "tails" favours the chance of landing heads in the set of tosses under inverse sampling. This seems to me to be a mere unsupported assertion. As it seems to violate Birnbaum's proof of the LP, it would also appear to be false.

To convince me otherwise, you would have to provide an effective method of calculating the prior under each design (inverse, direct) and a demonstration that this is the right thing to do.

No one is saying that the design of an experiment is not relevant in statistics. Obvously it is relevant in many situations, e.g., censored or truncated data. But it doesn't seem to be relevant in this case (i.e., inverse versus direct binomial sampling), and you'll need more than you've written here to convince me otherwise.

In the meantime, thank you for sending me a copy of your paper, which I will read with interest. Bill Jefferys 16:41, 18 September 2006 (UTC)

MY ANSWER

The proof of Birnbaum is not right because, if the LP is false, also the property of sufficiency no longer has exactly the same meaning.

For instance, the knowledge of the number of successes $x$ in a Bernoulli process is not `sufficient' for inference purpose. In fact, information about the used design is ancillary to $x$ (in the same way of $n$), and it cannot be ignored (not even under the same likelihood).

The method of calculating the prior is as follows:

the prior for a parameter may be assumed proportional to the corresponding maximum value of likelihood for all possible likelihood functions obtainable from the projected design. This method is consistent with the uniform prior and relative transformations.

Regarding your remarks, I would like noticing that p(h|e*) is unknown if d is unspecified. On the contrary, p(h|e*, d) is well-determined. This is the diference between them.

Rodolfo de Cristofaro decrist@ds.unifi.it

Bald claims that contradict everything I know about inference and about the LP. You have not convinced me. Sorry.

As far as I am concerned, this discussion is at an end. Bill Jefferys 12:16, 20 September 2006 (UTC)

It is a pity you are closed to newness. I can only invite you to read my papers. {Rodolfo de Cristofaro}} 25 September 2006

No one is closed to newness. I am certainly not. But mere newness is not appropriate for an encyclopedia. It must be backed up by a substantial body of published research. Please read the WikiPedia policy on no original research. The policy is that WikiPedia is to present the scholarly consensus, plus any significant opinions that vary from it, as presented (in cases like this) in the scholarly literature. BUT, the opinions of a single person, even if published in the scholarly literature, do not necessarily represent an opinion that should be included in this encyclopedia.

As I have said, I will read the paper you sent me, when I have time. But what you are flogging here contradicts everything I know about the LP, published by numerous scholars of excellent reputation, many of whom I know personally. Your singular assertions are insufficient to overcome my objections. Perhaps I will change my mind when I read your paper; but until then, you will have to post your objections without further comment from me. Bill Jefferys 01:07, 26 September 2006 (UTC)

[edit] Accept a hypothesis

Hello everyone. Well, two people have reverted my edits about accepting an hypothesis. The LP implies one can accept an hypothesis on the grounds that it cannot readily be improved. Edwards uses the two units of likelihood criterion for this, and surely his opinion should carry some weight. Perhaps this observation should appear somewhere else in the article. Comments? Robinh 08:10, 12 December 2006 (UTC)

[edit] On design not mattering

Just a note to accompany the removal of a misleading rendition of LP early on this page. Differing experimental designs typically lead to different Likelihood functions. The classic example given here, and as a motivating example in Berger's book on the LP, is a rather special case designed to make a counterintuitive point. Consequently it is unwise to state that LP implies that experimental design `doesn't matter', even colloquially. It usually does. There's a good discussion emphasizing the relevance of design to Bayesian inference in chapter 7 of Gelman et al. 1995 'Bayesian Data Analysis'.

Talk:Likelihood principle

From Wikipedia, the free encyclopedia

[edit] Accept a hypothesis

[edit] On design not mattering

Views

Navigation

Interaction

Search