Normal probability plot

From Wikipedia, the free encyclopedia

The normal probability plot is a graphical technique for assessing whether or not a data set is approximately normally distributed. The data are plotted against a theoretical normal distribution in such a way that the points should form an approximate straight line. Departures from this straight line indicate departures from normality. The normal probability plot is a special case of the probability plot.

1 Definition
2 Other distributions
3 See also
4 External links
5 References

[edit] Definition

The normal probability plot is formed by:

Vertical axis: Ordered response values
Horizontal axis: Normal order statistic medians

That is, the observations are plotted as a function of the corresponding normal order statistic medians which are defined as:

N (i) = G (U (i))

where U(i) are the uniform order statistic medians (defined below) and G is the quantile function of the normal distribution. The quantile function is the inverse of the cumulative distribution function (probability that X is less than or equal to some value). That is, given a probability, we want the corresponding quantile of the cumulative distribution function.

The uniform order statistic medians are defined as:

$m(i) = \begin{cases} 1 - 0.5^{1/n} & i = 1\\ \frac{i - 0.3175}{n + 0.365} & i = 2, 3, \ldots, n-1\\ 0.5^{1/n} & i = n\end{cases}$

In addition, a straight line can be fit to the points and added as a reference line. The further the points vary from this line, the greater the indication of departures from normality.

[edit] Other distributions

Main article: probability plot

Probability plots for distributions other than the normal are computed in exactly the same way. The normal quantile function G is simply replaced by the quantile function of the desired distribution. That is, a probability plot can easily be generated for any distribution for which one has the quantile function.

One advantage of this method of computing probability plots is that the intercept and slope estimates of the fitted line are in fact estimates for the location and scale parameters of the distribution. Although this is not too important for the normal distribution since the location and scale are estimated by the mean and standard deviation, respectively, it can be useful for many other distributions.

The correlation coefficient of the points on the normal probability plot can be compared to a table of critical values to provide a formal test of the hypothesis that the data come from a normal distribution.