Probability plot
From Wikipedia, the free encyclopedia
The probability plot is a graphical technique for assessing whether or not a data set follows a given distribution such as the normal or Weibull, and for visually estimating the location and scale parameters of the chosen distribution. The data are plotted against a theoretical distribution in such a way that the points should form approximately a straight line. Departures from this straight line indicate departures from the specified distribution.
The probability plot correlation coefficient is the correlation coefficient associated with the linear fit to the data in the probability plot; it is a measure of the goodness of the fit. Estimates of the location and scale parameters of the distribution are given by the intercept and slope. Probability plots can be generated for several competing distributions to see which provides the best fit, and the probability plot generating the highest correlation coefficient is the best choice since it generates the straightest probability plot.
For distributions with shape parameters (not counting location and scale parameters), the shape parameters must be known in order to generate the probability plot. For distributions with a single shape parameter, the probability plot correlation coefficient plot (PPCC plot) provides an excellent method for estimating the shape parameter.
The special case of the normal probability plot is covered separately due to its importance in many statistical applications.
Contents |
[edit] Definition
The probability plot is formed by:
- Vertical axis: Ordered response values
- Horizontal axis: Order statistic medians for the given distribution
The order statistic medians are defined as:
- N(i) = G(U(i))
where U(i) are the uniform order statistic medians (defined below) and G is the quantile function for the desired distribution. The quantile function is the inverse of the cumulative distribution function (probability that X is less than or equal to some value). That is, given a probability, we want the corresponding quantile of the cumulative distribution function.
The uniform order statistic medians are defined as:
In addition, a straight line can be fit to the points and added as a reference line. The further the points vary from this line, the greater the indication of a departure from the specified distribution.
This definition implies that a probability plot can be easily generated for any distribution for which the quantile function can be computed. One advantage of this method of computing probability plots is that the intercept and slope estimates of the fitted line are in fact estimates for the location and scale parameters of the distribution. Although this is not too important for the normal distribution (the location and scale are estimated by the mean and standard deviation, respectively), it can be useful for many other distributions.
[edit] Relation with Q-Q plots
Q-Q plots are similar to probability plots; the difference is that in a Q-Q plot, one uses the quantile of the distribution as the x-axis, whereas in a probability plot, one uses the expected value of the kth order statistic. Only when n is small is there a substantial difference between a Q-Q plot and a probability plot.
[edit] External links
[edit] References
- Chambers, John; William Cleveland, Beat Kleiner, and Paul Tukey (1983). Graphical Methods for Data Analysis. Wadsworth.
This article incorporates text from a public domain publication of the National Institute of Standards and Technology, a U.S. government agency.