Balls into bins

The balls-into-bins problem is a classic problem in probability theory that has many applications in computer science. The problem involves m balls and n boxes (or "bins"). Each time, a single ball is placed into one of the bins. After all balls are in the bins, we look at the number of balls in each bin; we call this number the load on the bin and ask: what is the maximum load on a single bin?

Obviously, it is possible to make the load as small as m/n by putting each ball into the least loaded bin. The interesting case is when the bin is selected at random, or at least partially at random.

Random allocation

When the bin for each ball is selected at random, independent of other choices, the maximum load might be as large as m. However, it is possible to calculate a tighter bound that holds with high probability. A "high probability" is a probability 1-o(1), i.e. the probability tends to 1 when n grows.

For the case m=n, with high probability the maximum load is:[1] [2]

\frac{\log n}{\log \log n}\cdot(1+o(1))

The maximum load can also be calculated for m<>n.[3]

Partially random allocation

Instead of just selecting a random bin for each ball, it is possible to select two or more bins for each ball and then put the ball in the least loaded bin. This is a compromise between a deterministic allocation, in which all bins are checked and the least loaded bin is selected, and a totally random allocation, in which a single bin is selected without checking other bins.

In this case, if mn and d≥2 bins are checked at each step, then with high probability the maximum load is:[4]

\frac{\log \log n}{\log d}\cdot(1+o(1))+\Theta(\frac{m}{n})

In particular, for m=n the maximum load is:

\frac{\log \log n}{\log d}\cdot(1+o(1))+\Theta(1)

which is exponentially less than with totally random allocation.

Infinite stream of balls

Instead of just putting m balls, it is possible to consider an infinite process in which, at each time step, a single ball is added and a single ball is taken, such that the number of balls remains constant. For m=n, after a sufficiently long time, with high probability the maximum load is similar to the finite version, both with random allocation and with partially random allocation.[4]

Applications

Dynamic resource allocation: consider a set of n identical computers. There are n users who need computing services. The users are not coordinated - each users comes on his own and selects which computer to use. Each user would of course like to select the least loaded computer, but this requires to check the load on each computer, which might take a long time. Another option is to select a computer at random; this leads, with high probability, to a maximum load of approximately \frac{\log n}{\log \log n}. A possible compromise is that the user will check only two computers, and use the least loaded of the two. This leads, with high probability, to a much smaller maximum load of approximately \frac{\log \log n}{\log 2}.

Hashing: consider a hash table in which all keys mapped to the same location are stored in a linked list. The efficiency of accessing a key depends on the length of its list. If we use a single hash function which selects locations with uniform probability, with high probability the longest chain has O\left(\frac{\log n}{\log \log n}\right) keys. A possible improvement is to use two hash functions, and put each new key in the shorter of the two lists. In this case, with high probability the longest chain has only O(\log \log n) elements.[5]

Proportional division. [6]

References

  1. Kolchin, Valentin F. (1978). Random allocations. Washington: Winston [usw.] ISBN 978-0470993941.
  2. Kotz, Samuel; Johnson, Norman Lloyd (1977). Urn models and their applications. New York, NY: John Wiley & Sons. ISBN 978-0471446309.
  3. Raab, Martin (1998). ""Balls into Bins" — A Simple and Tight Analysis". Lecture Notes in Computer Science: 159–170. doi:10.1007/3-540-49543-6_13.
  4. 4.0 4.1 Azar, Yossi (1999). "Balanced Allocations". SIAM Journal on Computing 29 (1): 180–200. doi:10.1137/s0097539795288490.
  5. Karp, R. M. (1996). "Efficient PRAM simulation on a distributed memory machine". Algorithmica 16 (4-5): 517–542. doi:10.1007/bf01940878.
  6. Jeff Edmonds and Kirk Pruhs (2006). "Balanced Allocations of Cake". 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06): 623. doi:10.1109/focs.2006.17. ISBN 0-7695-2720-5.