Hypothesis of linear regression

From Wikipedia, the free encyclopedia

In statistics, the linear regression problem can be formalized precisely, although one seldom uses this formalization in most practical cases.

Given the mathematical formalization of the statistical regression problem, let \Theta\subseteq\Gamma be a set of coefficients. The hypothesis of the linear regression is:

\exists (\beta^0,\cdots,\beta^p)\in\theta^{p+1}: \mathbb{E}(Y|X_1,\cdots,X_p)=\beta^0 + \sum_{j=1}^p \beta^j X_j

and the metric used is:

\forall f,g\in F, d(f,g) = \mathbb{E}[(f-g)^2]

We therefore want to minimize \mathbb{E}[(Y-f(X_1,\cdots,X_p))^2], which means that

f(X_1,\cdots,X_p)=\mathbb{E}(Y|X_1,\cdots,X_p) = \beta^0 + \sum_{j=1}^p \beta^j X_j

Hence, we only need to find \beta^0,\cdots,\beta^p.