Scoring rule

From Wikipedia, the free encyclopedia

In decision theory a score function, or scoring rule, is a measure of someone's performance when they are repeatedly making decisions under uncertainty. For example, a TV weather forecaster may give the probability of rain every day. A viewer could note the number of times that a 25% probability was quoted, over a ten year period, and compare this with the actual proportion of times that rain fell. If the actual percentage was substantially different to the stated probability we say that the forecaster is poorly calibrated. A poorly calibrated forecaster might be encouraged to do better by a bonus system. Suppose we reward the forecaster with a reward :u(x,q) when he makes a rain statement with an attached rain probability :q and :x = 1 if it rains, :x = 0 if it does not. Assuming that our weatherman wishes to maximise his expected reward he will choose a forecast :q which maximises

\hat{u}(u|p)= pu(1,q)+(1-p)u(0,q)\,

where p is his personal probability that rain will fall.

[edit] Proper score functions

A scoring rule u(x,q) is said to be proper if \hat{u}(x,q) is (uniquely) maximised when q = p for any value of 0\le p \le 1. The use of proper scoring rule encourages the forecaster to be honest, as his expected payoff is maximised when he reports his personal rain probability p as the prediction q. Two commonly used proper score functions are:

The Brier score, given by

u(x,q)=1-(x-q)^2\,

and the logarithmic score function.

u(x,q) =\begin{cases} \log q & \textrm{if \ } x = 1 \\ \log (1-q) & \textrm{if \ } x = 0 \\ \end{cases}