Gambling and information theory

From Wikipedia, the free encyclopedia

Statistical inference might be thought of as gambling theory applied to the world around. The myriad applications for logarithmic information measures tell us precisely how to take the best guess in the face of partial information^[1]. In that sense, information theory might be considered a formal expression of the theory of gambling. It's no surprise, therefore, that information theory has applications to games of chance.

1 Kelly gambling
2 Applications for self-information
3 References
4 Footnotes
5 See Also

[edit] Kelly gambling

Kelly gambling or proportional gambling is an application of information theory to gambling and (with some ethical and legal reservations) investing.

Part of Kelly's insight was to have the gambler maximize the expectation of the logarithm of his capital, rather than the expected profit from each bet. This is important, since in the latter case, one would be led to gamble all he had when presented with a favorable bet, and if he lost, would have no capital with which to place subsequent bets. Kelly realized that it was this logarithm which is additive in sequential bets, and "to which the law of large numbers applies." See Kelly criterion.

Doubling rate in gambling on a horse race is

$W(b,p) = \mathbb E[\log_2 S(X)] = \sum_{i=1}^m p_i \log_2 b_i o_i$

where there are $m$ horses, the probability of the $i$ th horse winning being $p i$ , the proportion of wealth bet on the horse being $b i$ , and the odds (payoff) being $o i$ (e.g., $o i = 2$ if the $i$ th horse winning pays double the amount bet). This quantity is maximized by proportional (Kelly) gambling:

$b = p \,$

for which

$\max_b W(b,p) = \sum_i p_i \log_2 o_i - H(p) \,$

where $H (p)$ is information entropy.

An important but simple relation exists between the amount of side information a gambler obtains and the expected exponential growth of his capital (Kelly):

$\mathbb E \log K_t = \log K_0 + \sum_{i=1}^t H_i$

for an optimal betting strategy, where $K 0$ is the initial capital, $K t$ is the capital after the tth bet, and $H i$ is the amount of side information obtained concerning the ith bet (in particular, the mutual information relative to the outcome of each betable event). This equation applies in the absence of any transaction costs or minimum bets. When these constraints apply (as they invariably do in real life), another important gambling concept comes into play: the gambler (or unscrupulous investor) must face a certain probability of ultimate ruin, which is known as the gambler's ruin scenario. Note that even food, clothing, and shelter can be considered fixed transaction costs and thus contribute to the gambler's probability of ultimate ruin.

This equation was the first application of Shannon's theory of information outside its prevailing paradigm of data communications (Pierce).

[edit] Applications for self-information

Surprisal and evidence in bits, as logarithmic measures of probability and odds respectively.

The logarithmic probability measure self-information or surprisal^[2], whose average is information entropy/uncertainty and whose average difference is KL-divergence, has applications to odds-analysis all by itself. Its two primary strengths are that surprisals: (i) reduce minuscule probabilities to numbers of manageable size, and (ii) add whenever probabilities multiply.

For example, one might say that "the number of states equals two to the number of bits" i.e. #states = 2^#bits. Here the quantity that's measured in bits is the logarithmic information measure mentioned above. Hence there are N bits of surprisal in landing all heads on one's first toss of N coins.

The additive nature of surprisals, and one's ability to get a feel for their meaning with a handful of coins, can help one put improbable events (like winning the lottery, or having an accident) into context. For example if one out of 17 million tickets is a winner, then the surprisal of winning from a single random selection is about 24 bits. Tossing 24 coins a few times might give you a feel for the surprisal of getting all heads on the first try.

The additive nature of this measure also comes in handy when weighing alternatives. For example, imagine that the surprisal of harm from a vaccination is 20 bits. If the surprisal of catching a disease without it is 16 bits, but the surprisal of harm from the disease if you catch it is 2 bits, then the surprisal of harm from NOT getting the vaccination is only 16+2=18 bits. Whether or not you decide to get the vaccination (e.g. the monetary cost of paying for it is not included in this discussion), you can in that way at least take responsibility for a decision informed to the fact that not getting the vaccination involves more than one bit of additional risk.

More generally, one can relate probability p to bits of surprisal sbits as probability = 1/2^sbits. As suggested above, this is mainly useful with small probabilities. However, Jaynes pointed out that with true-false assertions one can also define bits of evidence ebits as the surprisal against minus the surprisal for. This evidence in bits relates simply to the odds ratio = p/(1-p) = 2^ebits, and has advantages similar to those of self-information itself.

[edit] References

J. L. Kelly, Jr., "A New Interpretation of Information Rate," Bell System Technical Journal, Vol. 35, July 1956, pp. 917-26

[edit] Footnotes

^ Jaynes, E.T. (1998/2003) Probability Theory: The Logic of Science (Cambridge U. Press, NY).
^ Tribus, Myron (1961) Thermodynamics and Thermostatics: An Introduction to Energy, Information and States of Matter, with Engineering Applications (D. Van Nostrand Company Inc., 24 West 40 Street, New York 18, New York, U.S.A) ASIN: B000ARSH5S.