Kolmogorov–Smirnov test

Under null hypothesis that the sample comes from the hypothesized distribution F(x),

$\sqrt{n}D_n\xrightarrow{n\to\infty}\sup_t |B(F(t))|$

in distribution, where B(t) is the Brownian bridge.

If F is continuous then under the null hypothesis $\sqrt{n}D_n$ converges to the Kolmogorov distribution, which does not depend on F. This result may also be known as the Kolmogorov theorem; see Kolmogorov's theorem for disambiguation.

The goodness-of-fit test or the Kolmogorov–Smirnov test is constructed by using the critical values of the Kolmogorov distribution. The null hypothesis is rejected at level $\alpha$ if

$\sqrt{n}D_n>K_\alpha,\,$

where K_α is found from

$\operatorname{Pr}(K\leq K_\alpha)=1-\alpha.\,$

The asymptotic power of this test is 1.

Test with estimated parameters

If either the form or the parameters of F(x) are determined from the data X_i the critical values determined in this way are invalid. In such cases, Monte Carlo or other methods may be required, but tables have been prepared for some cases. Details for the required modifications to the test statistic and for the critical values for the normal distribution and the exponential distribution have been published by Pearson & Hartley (1972, Table 54). Details for these distributions, with the addition of the Gumbel distribution, are also given by Shorak & Wellner (1986, p239). The Lilliefors test represents a special case of this for the normal distribution.

Two-sample Kolmogorov–Smirnov test

The Kolmogorov–Smirnov test may also be used to test whether two underlying one-dimensional probability distributions differ. In this case, the Kolmogorov–Smirnov statistic is

$D_{n,n'}=\sup_x |F_{1,n}(x)-F_{2,n'}(x)|,$

where $F_{1,n}$ and $F_{2,n'}$ are the empirical distribution functions of the first and the second sample respectively.

The null hypothesis is rejected at level $\alpha$ if

$\sqrt{\frac{n n'}{n %2B n'}}D_{n,n'}>K_\alpha.$

Note that the two-sample test checks whether the two data samples come from the same distribution. This does not specify what that common distribution is (e.g. normal or not normal).

Setting confidence limits for the shape of a distribution function

While the Kolmogorov–Smirnov test is usually used to test whether a given F(x) is the underlying probability distribution of F_n(x), the procedure may be inverted to give confidence limits on F(x) itself. If one chooses a critical value of the test statistic D_α such that P(D_n > D_α) = α, then a band of width ±D_α around F_n(x) will entirely contain F(x) with probability 1 − α.

The Kolmogorov–Smirnov statistic in more than one dimension

The Kolmogorov–Smirnov test statistic needs to be modified if a similar test is to be applied to multivariate data. This is not straightforward because the maximum difference between two joint cumulative distribution functions is not generally the same as the maximum difference of any of the complementary distribution functions. Thus the maximum difference will differ depending on which of $\Pr(x < X \and y < Y)$ or $\Pr(X < x \and Y > y)$ or any of the other two possible arrangements is used. One might require that the result of the test used should not depend on which choice is made.

One approach to generalizing the Kolmogorov–Smirnov statistic to higher dimensions which meets the above concern is to compare the cdfs of the two samples with all possible orderings, and take the largest of the set of resulting K-S statistics. In d dimensions, there are 2^d−1 such orderings. One such variation is due to Peacock (1983) and another to Fasano & Franceschini (1987): see Lopes et al. (2007) for a comparison and computational details. Critical values for the test statistic can be obtained by simulations, but depend on the dependence structure in the joint distribution.

Footnotes

^ Stephens, M. A. (1974). "EDF Statistics for Goodness of Fit and Some Comparisons". Journal of the American Statistical Association (American Statistical Association) 69 (347): 730–737. doi:10.2307/2286009. JSTOR 2286009.
^ Marsaglia, G., Tsang, W. W., Wang, J. (2003) "Evaluating Kolmogorov’s Distribution", Journal of Statistical Software, 8 (18), 1–4. jstor
^ ^a ^b Kolmogorov, A. (1933) "Sulla determinazione empirica di una legge di distribuzione" G. Inst. Ital. Attuari, 4, 83
^ Smirnov, N.V. (1948) "Tables for estimating the goodness of fit of empirical distributions", Annals of Mathematical Statistics, 19, 279

References

Eadie, W.T.; D. Drijard, F.E. James, M. Roos and B. Sadoulet (1971). Statistical Methods in Experimental Physics. Amsterdam: North-Holland. pp. 269–271. ISBN 0444101179.
Stuart, Alan; Ord, Keith; Arnold, Steven [F.] (1999). Classical Inference and the Linear Model. Kendall's Advanced Theory of Statistics. 2A (Sixth ed.). London: Arnold. pp. 25.37–25.43. ISBN 0-340-66230-1. MR 1687411.
Corder, G.W., Foreman, D.I. (2009).Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach Wiley, ISBN 9780470454619
Pearson E.S., Hartley, H.O. (Editors) (1972) Biometrika Tables for Statisticians, Volume II. CUP. ISBN 0-521-06937-8.
Shorak, G.R., Wellner, J.A. (1986) Empirical Processes with Applications to Statistics, Wiley. ISBN 0-471-86725-X.
Stephens, M.A. (1979) Test of fit for the logistic distribution based on the empirical distribution function, Biometrika, 66(3), 591-5.
Peacock, J. A. (1983). "Two-dimensional goodness-of-fit testing in astronomy". Monthly Notices of the Royal Astronomical Society 202: 615–627. [1]
Fasano, G., Franceschini, A. (1987) A multidimensional version of the Kolmogorov–Smirnov test. Monthly Notices of the Royal Astronomical Society (ISSN 0035-8711), vol. 225, 155–170.[2]
Lopes, R.H.C., Reid, I., Hobson, P.R. (2007) "The two-dimensional Kolmogorov-Smirnov test". XI International Workshop on Advanced Computing and Analysis Techniques in Physics Research (April 23–27, 2007) Amsterdam, the Netherlands. [3]

External links

Short introduction
KS test explanation
JavaScript implementation of one- and two-sided tests
Online calculator with the K-S test
Open-source C++ code to compute the Kolmogorov distribution and perform the K-S test
Paper on Evaluating Kolmogorov’s Distribution; contains C implementation. This is the method used in Matlab.

Statistics

Descriptive statistics

Continuous data

Location	Mean (Arithmetic, Geometric, Harmonic) Median Mode

Dispersion	Range Standard deviation Coefficient of variation Percentile Interquartile range

Shape	Variance Skewness Kurtosis Moments L-moments

Count data

Index of dispersion

Summary tables

Dependence

Statistical graphics

Data collection

Designing studies	Effect size Standard error Statistical power Sample size determination

Survey methodology	Sampling Stratified sampling Opinion poll Questionnaire

Controlled experiment	Design of experiments Randomized experiment Random assignment Replication Blocking Factorial experiment Optimal design

Uncontrolled studies	Natural experiment Quasi-experiment Observational study

Statistical inference

Statistical theory	Sampling distribution Sufficient statistic Meta-analysis

Bayesian inference	Bayesian probability Prior Posterior Credible interval Bayes factor Bayesian estimator Maximum posterior estimator

Frequentist inference	Confidence interval Hypothesis testing Likelihood-ratio

Specific tests	Z-test (normal) Student's t-test F-test Pearson's chi-squared test Wald test Mann–Whitney U Shapiro–Wilk Signed-rank Kolmogorov–Smirnov test

General estimation	Bias Robustness Efficiency Maximum likelihood Method of moments Minimum distance Density estimation

Correlation and regression analysis

Correlation	Pearson product-moment correlation Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust

Generalized linear model	Exponential families Logistic (Bernoulli) Binomial Poisson

Partition of variance	Analysis of variance (ANOVA) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical, multivariate, time-series, or survival analysis

Categorical data	Cohen's kappa Contingency table Graphical model Log-linear model McNemar's test

Multivariate statistics	Multivariate regression Principal components Factor analysis Cluster analysis Copulas

Time series analysis	Decomposition (Trend, Stationary process) ARMA model ARIMA model Vector autoregression Spectral density estimation

Survival analysis	Survival function Kaplan–Meier Logrank test Failure rate Proportional hazards models Accelerated failure time model

Applications

Biostatistics	Bioinformatics Biometrics Clinical trials & studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process & Quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Outline
Index

Kolmogorov–Smirnov test

Contents

Kolmogorov–Smirnov statistic

Kolmogorov distribution