Linear model

From Wikipedia, the free encyclopedia

In statistics the linear model is a model given by

Y = X \beta + \varepsilon

where Y is an n×1 column vector of random variables, X is an n×p matrix of "known" (i.e., observable and non-random) quantities, whose rows correspond to statistical units, β is a p×1 vector of (unobservable) parameters, and ε is an n×1 vector of "errors", which are uncorrelated random variables each with expected value 0 and variance σ2. Often one takes the components of the vector of errors to be independent and normally distributed. Having observed the values of X and Y, the statistician must estimate β and σ2. Typically the parameters β are estimated by the method of maximum likelihood, which in the case of normal errors is equivalent (by the Gauss-Markov theorem) to the method of least squares.

If, rather than taking the variance of ε to be σ2I, where I is the n×n identity matrix, one assumes the variance is σ2M, where M is a known matrix other than the identity matrix, then one estimates β by the method of "generalized least squares", in which, instead of minimizing the sum of squares of the residuals, one minimizes a different quadratic form in the residuals — the quadratic form being the one given by the matrix M-1:

{\min_{\beta}}\left(y-X\beta\right)'M^{-1}\left(y-X\beta\right)

This has the effect of "de-correlating" normal errors, and leads to the estimator

\widehat{\beta}=\left(X'M^{-1}X\right)^{-1}X'M^{-1}y

which is the best linear unbiased estimator for β. If all of the off-diagonal entries in the matrix M are 0, then one normally estimates β by the method of weighted least squares, with weights proportional to the reciprocals of the diagonal entries.

Ordinary linear regression is a very closely related topic.

Contents

[edit] Generalizations

[edit] Generalized linear models

Generalized linear models, for which rather than

E(Y) = Xβ,

one has

g(E(Y)) = Xβ,

where g is the "link function". The variance is also not restricted to being normal.

An example is the Poisson regression model, which states that

Yi has a Poisson distribution with expected value eγ+δxi.

The link function is the natural logarithm function. Having observed xi and Yi for i = 1, ..., n, one can estimate γ and δ by the method of maximum likelihood.

[edit] General linear model

The general linear model (or multivariate regression model) is a linear model with multiple measurements per object. Each object may be represented in a vector.

[edit] See also

  • ANOVA, or analysis of variance, is historically a precursor to the development of linear models. Here the model parameters themselves are not computed, but X column contributions and their significance are identified using the ratios of within-group variances to the error variance and applying the F test.
  • Robust regression
In other languages