Total least squares

From Wikipedia, the free encyclopedia

Total least squares, also known as errors in variables, rigorous least squares, or orthogonal regression, is a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It can be applied to both linear and non-linear models.

1 Linear model
- 1.1 Example
- 1.2 Computation
2 Non-linear model
3 Geometrical interpretation
4 Scale invariant methods
5 See also
6 References
- 6.1 Others

[edit] Linear model

In the least squares method of data modeling, the objective function, S,

$S=\mathbf{r^TWr}$

is minimized. In linear least squares the model is defined as a linear combination of parameters, $\boldsymbol\beta$ , so the residuals are given by

$\mathbf{r=y-X\boldsymbol\beta}$

There a m observations, y and n parameters, β, (m>n). X is a m $\times$ n matrix whose elements are either constants or functions of the independent variables, x. The weight matrix, W, is, ideally, the inverse of the variance-covariance matrix, $\mathbf M_y$ of the observations, y. The independent variables are assumed to be error-free. The parameter estimates are found by setting the gradient equations to zero, which results in the normal equations ^[1]

$\mathbf{X^TWX\boldsymbol\beta=X^T Wy}$

Now, suppose that both x and y are subject to error, with variance-covariance matrices $\mathbf M_x$ and $\mathbf M_y$ respectively. In this case the objective function can be written as

$S=\mathbf{r_x^TM_x^{-1}r_x+r_y^TM_y^{-1}r_y}$

where $\mathbf r_x\,$ and $\mathbf r_y\,$ are the residuals in x and y respectively. Clearly these residuals cannot be independent of each other, but they must be constrained by some kind of relationship. Writing the model function as $\mathbf{f(r_x,r_y,\boldsymbol\beta)}$ , the constraints are expressed by m condition equations.^[2]

$\mathbf{F=\Delta y -\frac{\partial f}{\partial r_x} r_x-\frac{\partial f}{\partial r_y} r_y -X\Delta\boldsymbol\beta=0}$

Thus, the problem is to minimize the objective function subject to the m constraints. It is solved by the use of Lagrange multipliers. After some algebraic manipulations,^[3] the result is obtained.

$\mathbf{X^TM^{-1}X\Delta \boldsymbol\beta=X^T M^{-1} \Delta y}$ , or alternatively $\mathbf{X^TM^{-1}X \boldsymbol\beta=X^T M^{-1} y}$

Where M is the variance-covariance matrix relative to both independent and dependent variables.

$\mathbf{M=K_xM_xK_x^T+K_yM_yK_y^T;\ K_x=-\frac{\partial f}{\partial r_x},\ K_y=-\frac{\partial f}{\partial r_Y}}$

[edit] Example

In practice these equations are easy to use. When the data errors are uncorrelated, all matrices M and W are diagonal. Then, take the example of straight line fitting.

$f(x_i,\beta)=\alpha + \beta x_i\!$

It is easy to show that, in this case

$M_{ii}=\sigma^2_{y,i}+\beta^2 \sigma^2_{x,i}$

showing how the variance at the ith point is determined by the variances of both independent and dependent variables and by the model being used to fit the data. The expression may be generalized by noting that the parameter $β$ is the slope of the line.

$M_{ii}=\sigma^2_{y,i}+\left(\frac{dy}{dx}\right)^2_i \sigma^2_{x,i}$

An expression of this type is used in fitting pH titration data where a small error on x translates to a large error on y when the slope is large.

[edit] Computation

The computation of the TLS is described in standard texts.^[4] Pseudo code in the GNU Octave language takes just a few lines, as shown below where the equation AX=B is solved (note: A is m-by-n and B is m-by-k). For the special case k=1 (one RHS vector), the solution minimizes the 2-norm of (AX-B)/sqrt(1+X'X). In general (any k) the solution minimizes the Frobenius norm of (AX-B)/sqrtm(I+X'X), where / represents matrix division, sqrtm() is the matrix square root and I is the identity matrix^[5]

 function X = tls(A,B)
  
 n = size(A,2);
 C = [A B];
 [U S V] = svd(C,0);
 V12 = V(1:n,1+n:end);
 V22 = V(1+n:end,1+n:end);
 X = -V12/V22;

[edit] Non-linear model

For non-linear systems similar reasoning shows that the normal equations for an iteration cycle can be written as

$\mathbf{J^TM^{-1}J\Delta \boldsymbol\beta=J^T M^{-1} \Delta y}.$

[edit] Geometrical interpretation

When the independent variable is error-free a residual represents the "vertical" distance between the observed data point and the fitted curve (or surface). In total least squares a residual represents the distance between a data point and the fitted curve measured along some direction. In fact, if both variables are measured in the same units and the errors on both variables are the same, then the residual represents the shortest distance between the data point and the fitted curve, that is, the residual vector is perpendicular to the tangent of the curve.

A serious difficulty arises if the variables are not measured in the same units. First consider measuring distance between a data point and the curve - what are the measurement units for this distance? If we consider measuring distance based on Pythagoras' Theorem then it is clear that we shall be adding quantities measured in different units, and so this leads to meaningless results. Secondly, if we rescale one of the variables e.g. measure in grams rather than kilograms, then we shall end up with different results (a different curve). To avoid this problem of incommensurability it is sometimes suggested that we convert to dimensionless variables - this may be called normalization or standardization. However there are various ways of doing this, and these lead to fitted models which are not equivalent to each other.

[edit] Scale invariant methods

In short, total least squares does not have the property of units-invariance (it is not scale invariant). For a meaningful model we require this property to hold. A way forward is to realise that residuals (distances) measured in different units can be combined if multiplication is used instead of addition. Consider fitting a line: for each data point the product of the vertical and horizontal residuals equals twice the area of the triangle formed by the residual lines and the fitted line. We choose the line which minimizes the sum of these areas. Nobel laureate Paul Samuelson proved that it is the only line which possesses a set of certain desirable properties which includes scale invariance and invariance under interchange of variables (Samuelson, 1942)^[6]. This line has been rediscovered in different disciplines and is variously known as the reduced major axis, the geometric mean functional relationship (Draper and Smith, 1998) ^[7], least products regression, diagonal regression, line of organic correlation,and the least areas line. Tofallis (2002) ^[8] has extended this approach to deal with multiple variables.

[edit] See also

Errors-in-variables model

[edit] References

^ An alternative form is $\mathbf{X^TWX\boldsymbol\Delta \boldsymbol\beta=X^T W \boldsymbol\Delta y}$ , where $\boldsymbol\Delta \boldsymbol\beta$ is the parameter shift from some starting estimate of $\boldsymbol\beta$ and $\boldsymbol\Delta \mathbf y$ is the difference beween y and the value calculated using the starting value of $\boldsymbol\beta$
^ W.E. Deming, Statistical Adjustment of Data, Wiley, 1943
^ P. Gans, Data Fitting in the Chemical Sciences, Wiley, 1992
^ Gene H. Golub and Charles F. Van Loan (1996). Matrix Computations, 3rd Ed., The John Hopkins University Press. pp 596.
^ Sabine van Huffel and Joos Vandewalle (1987). The Total Least Squares Problem: Computational Aspects and Analysis. Society for Industrial Mathematics. ISBN 0898712750. pp 186.
^ Samuelson,P. A note on alternative regressions, Econometrica, 10(1), 80-83. 1942
^ Draper,NR and Smith,H. Applied Regression Analysis, 3rd edition,pp.92-96. 1998
^ Model fitting for multiple variables by minimising the geometric mean deviationDownloadable from: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1077322

[edit] Others

S. V. Huffel and P. Lemmerling, Total Least Squares and Errors-in-Variables Modeling: Analysis, Algorithms and Applications. Dordrecht, The Netherlands: Kluwer Academic Publishers, 2002.
S. Jo and S. W. Kim, "Consistent normalized least mean square filtering with noisy data matrix," IEEE Trans. Signal Processing, vol. 53, no. 6, pp. 2112-2123, Jun. 2005.
R. D. DeGroat and E. M. Dowling, "The data least squares problem and channel equalization," IEEE Trans. Signal Processing, vol. 41, no. 1, pp. 407–411, Jan. 1993.
T. Abatzoglou and J. Mendel, "Constrained total least squares," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP’87), Apr. 1987, vol. 12, pp. 1485–1488.
P. de Groen "An introduction to total least squares," in Nieuw Archief voor Wiskunde, Vierde serie, deel 14, 1996, pp. 237-253 arxiv.org.

v • d • e Least squares and regression analysis

Least squares	Linear least squares - Non-linear least squares - Partial least squares -Total least squares - Gauss–Newton algorithm - Levenberg–Marquardt algorithm

Regression analysis	Linear regression - Nonlinear regression - Linear model - Generalized linear model - Robust regression - Least-squares estimation of linear regression coefficients- Mean and predicted response - Poisson regression - Logistic regression - Isotonic regression - Ridge regression - Segmented regression - Nonparametric regression - Regression discontinuity

Statistics	Gauss–Markov theorem - Errors and residuals in statistics - Goodness of fit - Studentized residual - Mean squared error - R-factor (crystallography) - Mean squared prediction error - Minimum mean-square error - Root mean square deviation - Squared deviations - M-estimator

Applications	Curve fitting - Calibration curve - Numerical smoothing and differentiation - Least mean squares filter - Recursive least squares filter - Moving least squares - BHHH algorithm

Total least squares

From Wikipedia, the free encyclopedia

Contents

[edit] Linear model

[edit] Example

[edit] Computation

[edit] Non-linear model

[edit] Geometrical interpretation

[edit] Scale invariant methods

[edit] See also

[edit] References

[edit] Others

Views

Navigation

Interaction

Search