Yates' correction for continuity
From Wikipedia, the free encyclopedia
In statistics, Yates' correction for continuity, or Yates' chi-square test is used in certain situations when testing for independence in a contingency table. It is a requirement that a chi-square test have the assumption that the discrete probability of observed frequencies can be approximated by the chi-squared distribution, which is continuous.
To overcome this, Frank Yates, an English statistician suggested a correction for continuity which adjusts the formula for Pearson's chi-square test by subtracting 0.5 from the difference between each observed value and its expected value in a 2 × 2 contingency table. This reduces the chi-square value obtained and thus increases its p-value. It prevents overestimation of statistical significance for small data. This formula is chiefly used when at least one cell of the table has an expected frequency less than 5. Unfortunately, Yates' correction may tend to overcorrect. This can result in an overly conservative result that fails to reject the null hypothesis when it should.
where:
- Oi = an observed frequency
- Ei = an expected (theoretical) frequency, asserted by the null hypothesis
- N = number of distinct events
As a short-cut, for a 2x2 table with the following entries:
S | F | ||
---|---|---|---|
A | a | b | NA |
B | c | d | NB |
NS | NF | N |
we can write
Some sources say that this correction should be used when the expected frequency is less than 10[citation needed], yet other sources say that Yates corrections should always be applied[citation needed]. However, in situations with large sample sizes, using the correction will have little effect on the value of the test statistic, and hence the p-value obtained.
[edit] References
- Yates, F (1934). "Contingency table involving small numbers and the χ2 test". Journal of the Royal Statistical Society (Supplement) 1: 217-235.