Fraction of variance unexplained

From Wikipedia, the free encyclopedia

In statistics, the fraction of variance unexplained (or FVU) in the context of a regression task is the amount of variance of the regressand Y which cannot be explained, i.e., which is not correctly predicted, by the explanatory variable X.

For a more general definition of explained/unexplained variation/randomness/variance, see the article explained variation.

[edit] Formal definition

Given a regression function f(·) yielding for each yi, 1\leq i\leq N, an estimate \widehat{y}_i = f(x_i), we have:

\begin{align}
FVU &=  {SS_E \over SS_T} = 1-{SS_R \over SS_T}. \\
 &= 1 - R^2,
\end{align}

where R2 is the coefficient of determination and

\begin{align} 
SS_E&=\sum{}_{i=1}^N\;(y_i - \widehat{y_i})^2\\
SS_T&=\sum{}_{i=1}^N\;(y_i-\bar{y})^2 \\
SS_R&=\sum{}_{i=1}^N\;(\widehat{y_i}-\bar{y})^2 \text{ and }  \\
\bar{y}&=\frac{1}{N}\sum{}_{i=1}^N\;y_i.
\end{align}

Alternatively, the fraction of variance unexplained can be defined as:

FVU = \frac{MSE(f)}{\mathrm{var}[Y]} = \frac{\mathrm{var}[Y - f(X)]}{\mathrm{var}[Y]},

where MSE(f) is the mean squared error of the regression function f(·).

[edit] Explanation

It is useful to consider the second definition to get the idea behind FVU. When trying to predict Y, the most naïve regression function that we can think of is the constant function predicting the mean of Y, i.e., f(x_i)=\bar{y}. It follows that the MSE of this function equals the variance of Y; that is, SSE = SST, and SSR = 0. In this case, the variations in Y cannot be accounted for, and the FVU then has its maximum value of 1.

The FVU will also be 1 if the explanatory variable X tells us nothing about Y in the sense that the predicted values of Y do not covary with Y. But as prediction gets better and the MSE can be reduced, the FVU goes down. In the case of perfect prediction where \hat{y}_i = y_i, the MSE is 0, SSE = 0, SST = SSE, and the FVU is 0.

[edit] See also