Degrees of freedom (statistics)

From Wikipedia, the free encyclopedia

For other senses of these terms, see degrees of freedom or degree.

In statistics, the term degrees of freedom has two distinct senses. In the case of certain families of statistical distributions, it is the name given for one of their parameters. In the context of fitting statistical models, it denotes the number of independent pieces of information available to estimate another piece of information. Intuitively, we begin with a number of degrees of freedom within the data, deduct the degrees of freedom of the fit, and are left with the degrees of freedom for error. The more 'flexible' our fitting procedure is, the more degrees of freedom we lose.

The specifics of the nature of degrees of freedom are subject to much confusion, even amongst experts, though usually there are agreed upon methods of defining and calculating it for specific cases.

Contents

[edit] Residuals

A common way to think of degrees of freedom is as the number of independent pieces of information available to estimate another piece of information. More concretely, the number of degrees of freedom is the number of independent observations in a sample of data that are available to estimate a parameter of the population from which that sample is drawn. For example, if we have two observations, when calculating the mean we have two independent observations; however, when calculating the variance, we have only one independent observation, since the two observations are equally distant from the mean.

In fitting statistical models to data, the vectors of residuals are constrained to lie in a space of smaller dimension than the number of components in the vector. That smaller dimension is the number of degrees of freedom for error.

[edit] Linear regression

Perhaps the simplest example is this. Suppose

X_1,\dots,X_n\,

are random variables each with expected value μ, and let

\overline{X}_n={X_1+\cdots+X_n \over n}

be the "sample mean". Then the quantities

X_i-\overline{X}_n\,

are residuals that may be considered estimates of the errors Xi − μ. The sum of the residuals (unlike the sum of the errors) is necessarily 0. That means they are constrained to lie in a space of dimension n − 1. If one knows the values of any n − 1 of the residuals, one can thus find the last one. One says that "there are n − 1 degrees of freedom for error."

An only slightly less simple example is that of least squares estimation of a and b in the model

Y_i=a+bx_i+\varepsilon_i\ \mathrm{for}\ i=1,\dots,n

where εi, and hence Yi are random. Let \widehat{a} and \widehat{b} be the least-squares estimates of a and b. Then the residuals

e_i=y_i-(\widehat{a}+\widehat{b}x_i)\,

are constrained to lie within the space defined by the two equations

e_1+\cdots+e_n=0,\,
x_1 e_1+\cdots+x_n e_n=0.\,

One says that there are n − 2 degrees of freedom for error.

The capital Y is used in specifying the model, and lower-case y in the definition of the residuals. That is because the former are hypothesized random variables and the latter are data.

We can generalise this to multiple regression involving p parameters and covariates (e.g. p - 1 predictors and one mean), in which case the the cost in degrees of freedom of the fit is p.

[edit] Linear smoothers

We can generalise from linear regression to the case of linear smoothers, such as ridge regression and smoothing splines. In these cases, we have that

\hat{y} = Hy,

where \hat{y} is the vector of fitted values at each of the original covariate values from the fitted model, and y is the original vector of responses. We then define the fit's effective degrees of freedom as the trace of the 'hat' matrix, H. It is easy to see that this definition is consistent with the definition for linear regression.

[edit] Parameters in probability distributions

The probability distributions of residuals are often parametrized by these numbers of degrees of freedom. Thus one speaks of a chi-square distribution with a specified number of degrees of freedom, an F-distribution, a Student's t-distribution, or a Wishart distribution with specified numbers of degrees of freedom in the numerator and the denominator respectively.

In the familiar uses of these distributions, the number of degrees of freedom takes only integer values. The underlying mathematics in most cases allows for fractional degrees of freedom, which can arise in more sophisticated uses.

[edit] See also

[edit] External links