Anderson-Darling test

From Wikipedia, the free encyclopedia

The Anderson-Darling test, named after Theodore Wilbur Anderson, Jr. (1918–?) and Donald A. Darling (?–?), who invented it in 1952^[1], is one of the most powerful statistics for detecting most departures from normality. It may be used with small sample sizes n ≤ 25. Very large sample sizes may reject the assumption of normality with only slight imperfections, but industrial data with sample sizes of 200 and more have passed the Anderson-Darling test.^{[citation needed]}

The Anderson-Darling test assesses whether a sample comes from a specified distribution. The formula for the test statistic $A$ to assess if data $\{Y_1<\cdots <Y_N\}$ (note that the data must be put in order) comes from a distribution with cumulative distribution function (CDF) $F$ is

A 2 = - N - S

where

$S=\sum_{k=1}^N \frac{2k-1}{N}\left[\ln F(Y_k) + \ln\left(1-F(Y_{N+1-k})\right)\right].$

The test statistic can then be compared against the critical values of the theoretical distribution (dependent on which $F$ is used) to determine the P-value.

The Anderson-Darling test for normality is a distance or empirical distribution function (EDF) test. It is based upon the concept that when given a hypothesized underlying distribution, the data can be transformed to a uniform distribution. The transformed sample data can be then tested for uniformity with a distance test (Shapiro 1980).

In comparisons of power, Stephens (1974) found $A 2$ to be one of the best EDF statistics for detecting most departures from normality. The only statistic close was the $W 2$ (Shapiro-Wilk) statistic.

1 Procedure
2 See also
3 External links
4 References

[edit] Procedure

(If testing for normal distribution of the variable X)

1) The data of the variable X that should be tested is sorted from low to high.

2) The mean, $\bar{X}$ , and standard deviation, $s$ , are calculated from the sample of X.

3) The values of X are transformed to standard normal distribution using:

$Y_i=\frac{X_i-\bar{X}}{s}$

4) $P i$ is calculated using the significance of $Y i$ ; where $P i$ is the probability of the CDF of $Y i$ .

5) $A 2$ is calculated using:

$A^2=-\frac{\sum_{i=1}^n (2i-1)(Ln(P_i)+(Ln(1-P_{n+1-i})))}{n}-n.$

6) $A 2 *$ , an approximate adjustment for sample size, is calculated using:

$A^{2*}=A^2\left(1+\frac{0.75}{n}+\frac{2.25}{n^2}\right)$

7) If $A 2 *$ exceeds 0.752 then the hypothesis of normality is rejected for a 5% level test.

Note:

1. If s = 0 or any $P i =$ (0 or 1) then $A 2$ cannot be calculated and is undefined.

2. Above, it was assumed that the variable $X i$ was being tested for normal distribution. Any other theoretical distribution can be assumed by using its CDF. Each theoretical distribution has its own critical values, and some examples are: lognormal, exponential, Weibull, extreme value type I and logistic distribution.

[edit] See also

[edit] External links

US NIST Handbook of Statistics

[edit] References

^ Anderson, T. W.; Darling, D. A. (1952). "Asymptotic theory of certain "goodness-of-fit" criteria based on stochastic processes". Annals of Mathematical Statistics 23: 193-212.