Simple linear regression

From Wikipedia, the free encyclopedia

A simple linear regression is a linear regression in which there is only one covariate (predictor variable). Simple linear regression is a form of multiple regression.

Simple linear regression is used in situations to evaluate the linear relationship between two variables. One example could be the relationship between muscle strength and lean body mass. Another way to put it is that simple linear regression is used to develop an equation by which we can predict or estimate a dependent variable given an independent variable.

The regression equation is given by

$Y = a + bX + \varepsilon$

Where $Y$ is the dependent variable, $a$ is the y intercept, $b$ is the gradient or slope of the line, $X$ is independent variable and $\varepsilon$ is a random term.

The linear relationship between the two variables (i.e. dependent and independent) can be measured using a correlation coefficient e.g. the Pearson product moment correlation coefficient.

1 Estimating the regression line
2 Alternative formulas for the slope coefficient
3 Inference
4 Numerical example

[edit] Estimating the regression line

The parameters of the linear regression line, $Y = a + b X$ , can be estimated using the method of ordinary least squares. This method finds the line that minimizes the sum of the squares of the regression residuals, $\sum_{i=1}^N \hat{\varepsilon}_{i}^2$ . The residual is the difference between the observed value and the predicted value: $\hat{\varepsilon} _{i} = y_{i} - \hat{y}_{i}$

The minimization problem can be solved using calculus, producing the following formulas for the estimates of the regression parameters:

$\hat{b} = \frac {\sum_{i=1}^{N} (x_{i} - \bar{x})(y_{i} - \bar{y}) } {\sum_{i=1}^{N} (x_{i} - \bar{x}) ^2}$

$\hat{a} = \bar{y} - \hat{b} \bar{x}$

Ordinary least squares produces the following features:

The line goes through the point $(\bar{X},\bar{Y})$ .
The sum of the residuals is equal to zero.
The linear combination of the residuals in which the coefficients are the x-values is equal to zero.
The estimates are unbiased.

[edit] Alternative formulas for the slope coefficient

There are alternative (and simpler) formulas for calculating $\hat{b}$ :

$\hat{b} = \frac {\sum_{i=1}^{N} {(x_{i}y_{i})} - N \bar{x} \bar{y}} {\sum_{i=1}^{N} (x_{i})^2 - N \bar{x}^2} = r \frac {s_y}{s_x}$

Here, r is the correlation coefficient of X and Y, s_x is the sample standard deviation of X and s_y is the sample standard deviation of Y.

[edit] Inference

Under the assumption that the error term is normally distributed, the estimate of the slope coefficient has a normal distribution with mean equal to b and standard error given by:

$s_ \hat{b} = \sqrt { \frac {\sum_{i=1}^N \hat{\varepsilon_i}^2 /(N-2)} {\sum_{i=1}^N (x_i - \bar{x})^2} }.$ **** CHECK THIS FORMULA *****

A confidence interval for b can be created using a t-distribution with N-2 degrees of freedom:

$[ \hat{b} - s_ \hat{b} t_{N-2}^*,\hat{b} + s_ \hat{b} t_{N-2}^*]$

[edit] Numerical example

Suppose we have the sample of points {(1,-1),(2,4),(6,3)}. The mean of X is 3 and the mean of Y is 2. The slope coefficient estimate is given by:

$\hat{b} = \frac {(1 - 3)((-1) - 2) + (2 - 3)(4 - 2) + (6 - 3)(3 - 2)} {(1 - 3)^2 + (2 - 3)^2 + (6 - 3)^2 } = 7/14 = 0.5$

The standard error of the coefficient is 0.866. A 95% confidence interval is given by

[0.5 − 0.866 × 12.7062, 0.5 + 0.866 × 12.7062] = [−10.504, 11.504].

Categories: Regression analysis

Simple linear regression

From Wikipedia, the free encyclopedia

Contents

[edit] Estimating the regression line

[edit] Alternative formulas for the slope coefficient

[edit] Inference

[edit] Numerical example

Views

Navigation

Interaction

Search