Anderson-Darling test

From Wikipedia, the free encyclopedia

The Anderson-Darling test is one of the most powerful statistics for detecting most departures from normality. It may be used with small sample sizes $\left(n \le{25}\right)$ . Very large sample sizes may reject the assumption of normality with only slight imperfections. But, industrial data with sample sizes of 200 and more, have easily passed the Anderson-Darling test.

The Anderson-Darling test assesses whether known data come from a specified distribution. The formula for the test statistic $A$ to assess if data $\{Y_1<\dots <Y_N\}$ (note that the data must be put in order) comes from a distribution with cumulative distribution function $F$ is

$A 2 = - N - S$

where

$S=\sum_{k=1}^N \frac{2k-1}{N}\left[\ln F(Y_k) + \ln\left(1-F(Y_{N+1-k})\right)\right]$

The test statistic can then be compared against the distributions of the test statistic (dependent on which $F$ is used) to determine the P-value.

The Anderson-Darling Test for Normality is a distance or EDF (empirical distribution function) test. It is based upon the concept that when given a hypothesized underlying distribution, the data can be transformed to a uniform distribution. The transformed sample data can be then tested for uniformity with a distance test (Shapiro 1980).

In comparisons of power, Stephens (1974) found A2 to be one of the best EDF statistics for detecting most departures from normality. The only statistic close was the W2 (Shapiro and Wilk) statistic.

Procedure

1) The data is sorted from low to high.

2) The Mean, $\bar{X}$ , and Standard Deviation, $s$ , are calculated.