Talk:Null hypothesis

From Wikipedia, the free encyclopedia

Mathematics Portal

This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.

Mathematics rating:

Start Class

Mid Priority

Field: Probability and statistics

One of the 500 most frequently viewed mathematics articles.

Please update this rating as the article progresses, or if the rating is inaccurate. Click to show/hide comments.
Please add to or update the comments to suggest improvements to the article.
Please add useful comments here--Cronholm144 08:25, 28 May 2007 (UTC)

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

1 References
2 Unsorted Comments
3 ??
4 Elisabeth Anscombe
5 example conclusion
6 "File drawer problem"?
7 Accept, reject, do not reject Null Hypothesis
8 Formulation of null hypotheses
9 Earlier Null hypothesis Discussion
10 Analogy to proof by contradiction

[edit] References

Seeing as how this is based on a scientific, or at least scholarly topic, I believe this needs references. I'm (I hope appropriately) adding the "Not Verified" tag, so hopefully someone will come through and add references so the poor, confused beginning stats students (myself included) can know this is more than the ramblings of a demented mind. Having had experience using the subject matter, even reading this straight from a textbook can make one crazy. Garnet avi 10:41, 5 September 2006 (UTC)

[edit] Unsorted Comments

I think the main thing that the article misses is that the null hypothesis is always the hypothesis that there is no difference between two (or I suppose more) groups. Thus the word "null". Generally speaking, when studying something, you are trying to establish a difference between two groups (i.e. that group A, which received medication, did better than group B, which did not). It is statistically convenient (as well as philosophically convenient) to always start from the same premise. 24.45.14.133 04:44, 14 February 2007 (UTC) nullo

Sorry, this stuff was above the contents, but are not together or titled. I thought they would be more appropriate under the contents, so moved them to be so. Garnet avi 10:41, 5 September 2006 (UTC)

Is this sentence, from the article, correct?

But if the null hypothesis is that sample A is drawn from a population whose mean is no lower than the mean of the population from which sample B is drawn, the alternative hypothesis is that sample A comes from a population with a larger mean than the population from which sample B is drawn, and we will proceed to a one-tailed test.

It seems as if the null hypothesis says that mean(A) >= mean(B). Therefore the alternative hypothesis should be the negation of this, or mean(A) < mean(B). But the text states that the alternative hypothesis is that mean(A) > mean(B). Is this right?

I agree. Fixed. --Bernard Helmstetter 19:54, 8 Jan 2005 (UTC)

The difference between H0:μ1 = μ2 and H0:μ1 - μ2 = 0 is unclear, to say the least. --Bernard Helmstetter 20:02, 8 Jan 2005 (UTC)

This entry is confusing, to say the least. The introduction is somehow split in two sections by the TOC, and paradoxically is too short. The closing sections on controversies and pubilcation bias could be merged as well. I am not attempting a rewrite for I know little about statistics myself - but even so it is evident that the article could be clearer.--Duplode 01:20, 4 April 2006 (UTC)

[edit] ??

I'm a sophomore in high school, here's my request:

Could someone create a "Null hypothesis for dummies" section? as it is now, this article is very hard to comprehend. -- Somebody

"Null hypothesis for dummies" would be useful. In the examples there are null hypotheses stating that "the value of this real number is the same as the value of that real number". Is there some explanation for why such a hypothesis is reasonable? It seems to me that for a very broad class of probability distributions the null hypothesis has probability of 0 and the opposite probability of 1. The article at the moment says this:

However, concerns regarding the high power of statistical tests to detect differences in large samples have led to suggestions for re-defining the null hypothesis, for example as a hypothesis that an effect falls within a range considered negligible. This is an attempt to address the confusion among non-statisticians between significant and substantial, since large enough samples are likely to be able to indicate differences however minor.

So the more data we have, the more likely it is that the null hypothesis is rejected? This is exactly what should happen if the null hypothesis is always false - the only difference is in how much data we need to prove that. Is this the case in actual use? If so, how does the theory justify drawing conclusions from a false premise? Presumably the theory is "robust enough" when there isn't "too much data", but how exactly does this work? 82.103.214.43 14:58, 11 June 2006 (UTC)

[edit] Elisabeth Anscombe

Who the hell is she and why is she quoted here? Any reference?

Misspelling. Elizabeth Anscombe. Flapdragon 22:00, 18 May 2006 (UTC)

Are you sure the author of the quote is Elizabeth Anscombe? Francis Anscombe was a statistician who, among other things, applied statistical methods to agriculture and is a much more plausible source for that quote. As stated above, a source for the quote would be nice.--jdvelasc 21:29, 9 October 2006 (UTC)

Also, go to here for details of a significant, although inadvertent contribution to the philosophy of language by Anscombe.Lindsay658 22:07, 16 May 2007 (UTC)

As I said above, it is very doubtful that Elizabeth Anscombe is the author of that quote. I am taking it out until someone can source it. --Jdvelasc 18:35, 4 September 2007 (UTC)

[edit] example conclusion

"For example, if we want to compare the test scores of two random samples of men and women, a null hypothesis would be that the mean score of the male population was the same as the mean score of the female population, and therefore there is no significant statistical difference between them:"

This is wrong, the two samples can have the the same mean and be statistically totally different (e.g. differ in variance). 84.147.219.67 15:56, 26 June 2006 (UTC)

I made some changes: I deleted "and therefore there is no significant statistical difference between them:", because it is redundant and arguably incorrect. I also added a few words to the part about assuming they're drawn from the same population, to say that this means they have the same variance and shape of distribution too. I deleted the equation with mu1 - mu0 = 0 because it was out of context IMO given the sentence that was just before it, and because it is practically the same as the previous equation mu1 = mu0. Sorry I forgot again to put an "edit summary". Coppertwig 00:14, 5 November 2006 (UTC)

[edit] "File drawer problem"?

What is it, and why does it make a sudden and unexplained appearance near the end of this article? If I hadn't gotten a C- in stats I'd go out and fix it myself. :) --User:Dablaze 13:29, 1 August 2006 (UTC)

The "file drawer problem" is this: suppose a researcher carries out an experiment and does not find any statistically significant difference between two populations. (For example, tests whether a certain substance cures a certain illness and does not find any evidence that it does.) Then, the researcher may consider that this result (or "non-result") is not very interesting, and put all the notes about it into a file drawer and forget about it, instead of publishing it which is what the researcher would have done if the test had found the interesting result that the substance apparently cures the illness.

Not publishing it is a problem for several reasons: one, other researchers may waste time carrying out the same test on a useless substance and also not publishing. Two, it is sometimes possible to find a statistically significant result by combining the results of several studies; this can't easily happen if it isn't published so nobody knows about it. Three, if various researchers keep repeating the same experiment and not finding statistically significant results, and then one does the same experiment and by a random fluke (luck) does get a statistically significant result, they might publish that and it would look as if the substance cures the illness, although if you combined the results of all the studies you would see that there is no statistically significant result overall.

It really does make sense if you can guess what "file drawer problem" means. Does it need a few words in the article to explain it? Coppertwig 00:00, 5 November 2006 (UTC)

[edit] Accept, reject, do not reject Null Hypothesis

After a statistical test (say, determining p-values), one can only reject or not reject the Null Hypothesis. Accepting the alternative hypothesis is wrong because there is always a probability that you are incorrectly accepting or rejecting (alpha and beta; type I and type II error). --70.111.218.254 02:03, 22 November 2006 (UTC)

Actually, it seems that the first paragraph is entirely confusing. One can not ACCEPT null hypothesis. One can only REJECT or FAIL TO REJECT it. On the other hand, one can ACCEPT alternative hypothesis or FAIL TO ACCEPT it. See D. Gujarati: Basic Econometrics, Fourth Edition, 2004, p.134 --- Argyn

Besides type I and type II error, there's a problem which remains big even when your statistical significance is excellent: that both the Null Hypothesis and the Alternative Hypothesis can be false. I suppose they usually are; they are usually at best oversimplifications (models) of a situation in the real world. That's why the alternative hypothesis is merely "accepted", not "proven" nor "shown" nor "established". However, it can be "shown" or "established", with a certain statistical significance level, that the null hypothesis is false. --Coppertwig 10:29, 15 February 2007 (UTC)

[edit] Formulation of null hypotheses

This article appears to be a little confused at the moment — I would appreciate a little discussion before I make some changes. In particular...

"if the null hypothesis is that sample A is drawn from a population whose mean is lower than the mean of the population from which sample B is drawn, the alternative hypothesis is that sample A comes from a population with a higher mean than the population from which sample B is drawn, which can be tested with a one-tailed test."

I believe this to be misleading. A null hypothesis is a statement of no effect — by definition it has no directionality. There is a very good reason for this: null hypothesis testing works by first assuming the null hypothesis to be true, and then calculating how often we would expect to see results as extreme as those observed even when the null hypothesis is true. That is, we are trying to find out how often the observed results would be obtained by chance.

It is only possible to do this when we have a well-defined null hypothesis — e.g. when it states that one mean is equal to another mean, or when a mean is equal to a defined value. It would not be possible to calculate our test statistic if our null hypothesis merely said, "Mean one is less than mean two", and indeed this would not be a null hypothesis.

I think the confusion arises in the case of a one-tailed test. Take, for example, an experiment investigating the height of men and women in a class. We might wish to test the hypothesis "that men are taller than women". In this case our hypotheses are as follows:

Null: That men and women are of equal height.
Experimental: That men are taller (have greater height) than women.

In this case, we have defined our experimental hypothesis in a one-tailed form. The question many people ask is, "But what if women are taller than men? Surely neither of our hypotheses addresses this?". The confusion then lies in whether or not the null hypothesis should incorporate this possibility. To the very best of my knowledge, it should not: the null hypothesis remains a statement of no effect.

The reason for this is that we are looking to see whether there is evidence to support the specific experimental hypothesis that we have postulated. If we find our results to be non-significant, this tells us that we do not have sufficient evidence to accept our specific experimental hypothesis. If it turns out that we're interested in a difference that we find in the other direction, well that suggests that we should have proposed a two-tailed hypothesis in the first place. Indeed, I would argue that it is very rare indeed that a one-tailed hypothesis is appropriate: we are almost always interested in results in the other direction from that predicted.

Does this sound sensible? If so, then I will modify the article accordingly, and will add some relevant citations! -- Sjb90 16:34, 15 May 2007 (UTC)

OK, I will start making some changes to this later. This has been quite a complicated issue to resolve, and has involved going back to the paper that first defined the term 'null hypothesis'. Full discussion can be found on Lindsay658's talk page. -- Sjb90 11:06, 18 May 2007 (UTC)

[edit] Earlier Null hypothesis Discussion

I thought that the following should appear here (originally at [1] for the ease of others.Lindsay658 (talk) 21:21, 22 February 2008 (UTC)

Hi there,

Over on the null hypothesis talk page, I've been canvassing for opinions on a change that I plan to make regarding the formulation of a null hypothesis. However I've just noticed your excellent edits on Type I and type II errors. In particular, in the null hypothesis section you say:

The consistent application by statisticians of Neyman and Pearson's convention of representing "the hypothesis to be tested" (or "the hypothesis to be nullified") with the expression Ho -- associated with an increasing tendency to incorrectly read the expression's subscript as a zero, rather than an "O" (for "original") -- has led to circumstances where many understand the term "the null hypothesis" as meaning "the nil hypothesis". That is, they incorrectly understand it to mean "there is no phenomenon", and that the results in question have arisen through chance.

Now I know the trouble with stats in empirical science is that everyone is always feeling their way to some extent -- it's an inexact science that tries to bring sharp definition to the real world! But I'm really intrigued to know what you're basing this statement on -- I'm one of those people who has always understood the null hypothesis to be a statement of null effect. I've just dug out my old undergrad notes on this, and that's certainly what I was taught at Cambridge; and it's also what my stats reference (Statistical Methods for Psychology, by David C. Howell) seems to suggest. In addition, whenever I've been an examiner for public exams, the markscheme has tended to state the definition of a null as being a statement of null effect.

I'm a cognitive psychologist rather than a statistician, so I'm entirely prepared to accept that this may be a common misconception, but was wondering whether you could point me towards some decent reference sources that try to clear this up, if so! —The preceding unsigned comment was added by Sjb90 (talk • contribs) 11:07, 16 May 2007 (UTC).

Sjb90 . . . There are three papers by Neyman and Pearson:

Neyman, J. & Pearson, E.S., "On the Use and Interpretation of Certain Test Criteria for Purposes of Statistical Inference, Part I", reprinted at pp.1-66 in Neyman, J. & Pearson, E.S., Joint Statistical Papers, Cambridge University Press, (Cambridge), 1967 (originally published in 1928).
Neyman, J. & Pearson, E.S., "The testing of statistical hypotheses in relation to probabilities a priori", reprinted at pp.186-202 in Neyman, J. & Pearson, E.S., Joint Statistical Papers, Cambridge University Press, (Cambridge), 1967 (originally published in 1933).
Pearson, E.S. & Neyman, J., "On the Problem of Two Samples", reprinted at pp.99-115 in Neyman, J. & Pearson, E.S., Joint Statistical Papers, Cambridge University Press, (Cambridge), 1967 (originally published in 1930).

Unfortunately, I do not have these papers at hand and, so, I can not tell you precisely which of these papers was the source of this statement; but I can assure you that the statement was made on the basis of reading all three papers. From memory, I recall that they were quite specific in their written text and in their choice of mathematical symbols to stress that it was O for original (and not 0 for zero). Also, from memory, I am certain that the first use of the notion of a "null" hypothesis comes from:

Fisher, R.A., The Design of Experiments, Oliver & Boyd (Edinburgh), 1935.

And, as I recall, Fisher was adamant that whatever it was to be examined was the NULL hypothesis, because it was the hypothesis that was to be NULLIFIED.

I hope that is of some assistance to you.

It seems that it is yet one more case of people citing citations that are also citing a citation in someone else's work, rather than reading the originals.

The second point to make is that the passage you cite from my contribution was 100% based on the literature (and, in fact, the original articles).

Finally, and this comment is not meant to be a criticism of anyone in particular, simply an observation, I came across something in social science literature that mentioned a "type 2 error" about two years ago. It took me nearly 12 months to track down the source to Neyman and Pearson's papers. I had many conversations with professional mathematicians and statisticians and none of them had any idea where the notion of Type I and type II errors came from and, as a consequence, I would not be at all surprised to find that the majority of mathematicians and statisticians had no idea of the origins and meaning of "null" hypothesis.

I'm not entirely certain, But I have a feeling that Fisher's work -- which I cited as "Fisher (1935, p.19)", and that reference would be accurate -- was an elaboration and extension of the work of Neyman and Pearson (and, as I recall, Fisher completely understood the it was an oh, rather than a zero in the subscript). Sorry I can't be of any more help. The collection that contains the reprints of Neyman and Pearson's papers and the book by Fisher should be fairly easy for you to find in most university libraries.Lindsay658 22:37, 16 May 2007 (UTC)

Thanks for the references, Lindsay658 -- I'll dig them out, and have a bit of a chat with my more statsy colleagues here, and will let you know what we reckon. I do agree that it's somewhat non-ideal that such a tenet of experimental design is described rather differently in a range of texts!

As a general comment, I think it entirely acceptable for people working in a subject, or writing a subject-specific text book / course to read texts more geared towards their own flavour of science, rather than the originals. After all, science is built upon the principle that we trust much of the work created by our predecessors, until we have evidence to do otherwise, and most of these derived texts tend to be more accessible to the non-statistician. However I agree that, when writing for e.g. Wikipedia, it is certainly useful to differentiate between 'correct' and 'common' usage, particularly when the latter is rather misleading. This is why your contribution intrigued me so -- I look forward to reading around this and getting back to you soon -- many thanks for your swift reply! -- Sjb90 07:39, 17 May 2007 (UTC)

OK, I've now had a read of the references that you mentioned, as well as some others that seemed relevant. Thanks again for giving me these citations -- they were really helpful. This is what I found:

First of all, you are quite right to talk of the null hypothesis as the 'original hypothesis' -- that is, the hypothesis that we are trying to nullify. However Neyman & Pearson do in fact use a zero (rather than a letter 'O') as the subscript to denote a null hypothesis. In this way, they show that the null hypothesis is merely the original in a range of possible hypotheses: H₀, H₁, H₂ ... H_i.
As you mentioned, Fisher introduced the term null hypothesis, and defines this a number of times in The Design of Experiments. When talking of an experiment to determine whether a taster can successfully discriminate whether milk or tea was added first to a cup, Fisher defines his null hypothesis as "that the judgements given are in no way influenced by the order in which the ingredients have been added ... Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis."
Later, Fisher talks about fair testing, namely in ensuring that other possible causes of differentiation (between the cups of tea, in this case) are held fixed or are randomised, to ensure that they are not confounds. By doing this, Fisher explains that every possible cause of differentiation is thus now i) randomised; ii) a consequence of the treatment itself (order of pouring milk & tea), "of which on the null hypothesis there will be none, by definition"; or iii) an effect "supervening by chance".
Furthermore, Fisher explains that a null hypothesis may contain "arbitrary elements" -- e.g. in the case where H₀ is "that the death-rates of two groups of animal are equal, without specifying what those death-rates actually are. In such cases it is evidently the equality rather than any particular values of the death-rates that the experiment is designed to test, and possibly to disprove."
Finally, Fisher emphasises that "the null hypothesis must be exact, that is free from vagueness and ambiguity, because it must supply the basis of the 'problem of distribution,' of which the test of significance is the solution". He gives an example of a hypothesis that can never be a null hypothesis: that a subject can make some discrimination between two different sorts of object. This cannot be a null hypothesis, as it is inexact, and could relate to an infinity of possible exact scenarios.

So, where does that leave us? I propose to make the following slight changes to the Type I and type II errors page and the null hypothesis page.

I will tone down the paragraph about original vs. nil hypotheses: the subscript is actually a zero, but it is entirely correct that the hypothesis should not be read as a "nil hypothesis" -- I agree that it is important to emphasise that the null hypothesis is that one that we are trying to nullify.
In the null hypothesis article, I will more drastically change the paragraph that suggests that, for a one-tailed test, it is possible to have a null hypothesis "that sample A is drawn from a population whose mean is lower than the mean of the population from which sample B is drawn". As I had previously suspected, this is actively incorrect: such a hypothesis is numerically inexact. The null hypothesis, in the case described, remains "that sample A is drawn from a population with the same mean as sample B".
I will tone down my original suggestion slightly: A null hypothesis isn't a "statement of no effect" per se, but in an experiment (where we are manipulating an independent variable), it logically follows that the null hypothesis states that the treatment has no effect. However null hypotheses are equally useful in an observation (where we may be looking to see whether the value of a particular measured variable significantly differs from that of a prediction), and in this case the concept of "no effect" has no meaning.
I'll add in the relevant citations, as these really do help to resolve this issue once and for all!

Thanks again for your comments on this. I will hold back on my edits for a little longer, in case you have any further comments that you would like to add!

-- Sjb90 17:33, 17 May 2007 (UTC)

I agree with your changes. As you can see from [[2]],

[[3]], [[4]], and [[5]] I really didn't have a lot to work with.

I believe that it might be helpful to make some sort of comment to the effect that when statisticians work -- rather than scientists, that is -- they set up a question that is couched in very particular terms and then try to disprove it (and, if it can not be disproved, the proposition stands, more or less by default).

The way that the notion of just precisely how the issue of a "null hypothesis" is contemplated by "statisticians" and the way that this (to common ordinary people counter-intuitive notion) of, essentially, couching one's research question as the polar opposite of what one actually believes to be the case (by contrast with "scientists" who generally couch their research question in terms of what they actually believe to be the case) is something that someone like you could far better describe than myself -- and, also, I believe that it would be extremely informative to the more general reader. All the best in your editing. If you have any queries, contact me again pls. Lindsay658 21:49, 17 May 2007 (UTC)

Just a note to say that I have finally had the chance to sit down and word some changes to the Null hypothesis article and the section on Type_I_and_type_II_errors#The_null_hypothesis. Do shout and/or make changes if you think my changes are misleading/confusing! -- Sjb90 11:33, 14 June 2007 (UTC)

[edit] Analogy to proof by contradiction

I noticed the request to simplify (above), and had the idea of inserting something into the lede that would relate this idea to a proof by contradiction. For example, "This is similar to the idea of a proof by contradiction, but instead of a definite proof, experimental data is used to show that the null hypothesis is very unlikely to be true.". I'm not entirely sure if that wording is clear enough, or perhaps there's some imprecision; suggestions? —AySz88 \^-^ 22:48, 13 March 2008 (UTC)