Null hypothesis
From Wikipedia, the free encyclopedia
In statistics, a null hypothesis is a hypothesis set up to be nullified or refuted in order to support an alternative hypothesis. When used, the null hypothesis is presumed true until statistical evidence in the form of a hypothesis test indicates otherwise. The use of the null hypothesis is controversial (see papers linked below). In other words a null hypothesis is often the reverse of what the experimenter actually believes; it is put forward to allow the data to contradict it.
Contents |
[edit] Introduction
The null hypothesis is generally that which is presumed to be true initially. Hence, we reject only when we are quite sure that it is false, often 90, 95, or 99% confident that the data do not support it.
[edit] An example
For example, if we want to compare the test scores of two random samples of men and women, a null hypothesis would be that the mean score of the male population was the same as the mean score of the female population:
- H0 : μ1 = μ2
where:
- H0 = the null hypothesis
- μ1 = the mean of population 1, and
- μ2 = the mean of population 2.
Alternatively, the null hypothesis can postulate that the two samples are drawn from the same population, so that the variance and shape of the distributions are equal, as well as the means.
Formulation of the null hypothesis is a vital step in testing statistical significance. Having formulated such a hypothesis, one can establish the probability of observing the obtained data or data more different from the prediction of the null hypothesis, if the null hypothesis is true. That probability is what is commonly called the "significance level" of the results.
When a null hypothesis is formed, it is always in contrast to an implicit alternative hypothesis, which is accepted if the observed data values are sufficiently improbable under the null hypothesis. The precise formulation of the null hypothesis has implications for the alternative. For example, if the null hypothesis is that sample A is drawn from a population with the same mean as sample B, the alternative hypothesis is that they come from populations with different means, which can be tested with a two-tailed test of significance. But if the null hypothesis is that sample A is drawn from a population whose mean is lower than the mean of the population from which sample B is drawn, the alternative hypothesis is that sample A comes from a population with a higher mean than the population from which sample B is drawn, which can be tested with a one-tailed test.
[edit] Limitations
A null hypothesis is only useful if it is possible to calculate the probability of observing a data set with particular parameters from it. In general it is much harder to be precise about how probable the data would be if the alternative hypothesis is true.
If experimental observations contradict the prediction of the null hypothesis, it means that either the null hypothesis is false, or we have observed an event with very low probability. This gives us high confidence in the falsehood of the null hypothesis, which can be improved by increasing the number of trials. However, accepting the alternative hypothesis only commits us to a difference in observed parameters; it does not prove that the theory or principles that predicted such a difference is true, since it is always possible that the difference could be due to additional factors not recognised by the theory.
For example, rejection of a null hypothesis (that, say, rates of symptom relief in a sample of patients who received a placebo and a sample who received a medicinal drug will be equal) allows us to make a non-null statement (that the rates differed); it does not prove that the drug relieved the symptoms, though it gives us more confidence in that hypothesis.
The formulation, testing, and rejection of null hypotheses is methodologically consistent with the falsificationist model of scientific discovery formulated by Karl Popper and widely believed to apply to most kinds of empirical research. However, concerns regarding the high power of statistical tests to detect differences in large samples have led to suggestions for re-defining the null hypothesis, for example as a hypothesis that an effect falls within a range considered negligible. This is an attempt to address the confusion among non-statisticians between significant and substantial, since large enough samples are likely to be able to indicate differences however minor.
The theory underlying the idea of a null hypothesis is closely associated with the frequentist theory of probability, in which probabilistic statements can only be made about the relative frequencies of events in arbitrarily large samples. A failure to reject the null hypothesis is meaningful only in relation to an arbitrarily large population from which the observed sample is supposed to be drawn.
[edit] Publication bias
In 2002, a group of psychologists launched a new journal dedicated to experimental studies in psychology which support the null hypothesis. The Journal of Articles in Support of the Null Hypothesis (JASNH) was founded to address a scientific publishing bias against such articles. [1] According to the editors,
- "other journals and reviewers have exhibited a bias against articles that did not reject the null hypothesis. We plan to change that by offering an outlet for experiments that do not reach the traditional significance levels (p < 0.05). Thus, reducing the file drawer problem, and reducing the bias in psychological literature. Without such a resource researchers could be wasting their time examining empirical questions that have already been examined. We collect these articles and provide them to the scientific community free of cost."
The "File Drawer problem" is a problem that exists due to the fact that academics tend not to publish results that indicate the null hypothesis could not be rejected. That is, they got a statistically significant result that indicated the relationship they were looking for did not exist. Even though these papers can often be interesting, they tend to end up unpublished, in "file drawers."
[edit] Controversy
Null hypothesis testing has always been controversial. Many statisticians have pointed out that rejecting the null hypothesis says nothing or very little about the likelihood that the null is true. Under traditional null hypothesis testing, the null is rejected when P(Data | Null)† is extremely unlikely, say 0.05. However, researchers are really interested in P(Null | Data) which cannot be inferred from a p-value. In some cases, P(Null | Data) approaches 1 while P(Data | Null) approaches 0, in other words, we can reject the null when it's virtually certain to be true. For this and other reasons, Gerd Gigerenzer has called null hypothesis testing "mindless statistics" while Jacob Cohen describes it as a ritual conducted to convince ourselves that we have the evidence needed to confirm our theories.
Elizabeth Anscombe, a student of Wittgenstein, notes that “Tests of the null hypothesis that there is no difference between certain treatments are often made in the analysis of agricultural or industrial experiments in which alternative methods or processes are compared. Such tests are [...] totally irrelevant. What are needed are estimates of magnitudes of effects, with standard errors."
Bayesian statisticians normally reject the idea of null hypothesis testing. Given a prior probability distribution for one or more parameters, sample evidence can be used to generate an updated posterior distribution. In this framework, but not in the null hypothesis testing framework, it is meaningful to make statements of the general form "the probability that the true value of the parameter is greater than 0 is p".
†(Read: the probability of observing the particular data given that the null hypothesis is true; see conditional probability.)
[edit] References
HyperStat Online - http://davidmlane.com/hyperstat/A29337.html
[edit] See also
- Statistical hypothesis testing
- P-value
- Publication bias
- Null Hypothesis - The Journal of Unlikely Science - a satirical science website
- References for arguments for and against null hypothesis significance testing: http://core.ecu.edu/psyc/wuenschk/StatHelp/NHST-SHIT.htm