Mathematical formalization of the statistical regression problem

From Wikipedia, the free encyclopedia

Although a rigorous formalization of the regression problem is not necessary in most cases, the theoretical study of the regression problem requires a precise mathematical context than that given in the Regression analysis article.


(\Omega,\mathcal{A}, P) will denote a probability space and (Γ,S) will be a measure space. \Theta\subseteq\Gamma is a set of coefficients.


Very often, \Gamma = \mathbb{R}^n and S=\mathcal{B}_n with n\in\mathbb{N}^*.

The dependent variable Y is a random variable, i.e. a measurable function:

Y:(\Omega,\mathcal{A})\rightarrow(\Gamma, S).

This variable will be "explained" using other random variables called "factors".

Let p\in\mathbb{N}^*. p is called number of factors.

\forall i\in \{1,\cdots,p\}, X_i:(\Omega,\mathcal{A})\rightarrow(\Gamma, S).

Let f:\left\{ \begin{matrix} \Gamma^p\times\Theta&\rightarrow&\Gamma\\ (X_1,\cdots,X_p;\theta)&\mapsto&f(X_1,\cdots,X_p,\theta) \end{matrix} \right..

We finally define \varepsilon:=Y-f(X_1,\cdots,X_p;\theta).

In other languages