Ski rental problem
The ski rental problem is the name given to a class of problems in which there is a choice between continuing to pay a repeating cost or paying a one-time cost which eliminates or reduces the repeating cost.
The problem
Many online problems have a sub-problem called the rent/buy problem. We need to decide whether to stay in the current state and pay a certain amount of cost per time unit, or switch to another state and pay some fixed large cost with no further payment.[1] Ski rental [2][3] is one classical toy example, where the rent/buy is the entire problem. Its basic version is as follows:
You are going skiing for an unknown number of days (you do not know the exact number due to various reasons, e.g., loss of interest, accidents that break your legs, or extremely bad weather). Assume that renting skis costs $1 per day and buying skis costs $10. Every day you have to decide whether to continue renting skis for one more day or buy a pair of skis. If you know in advance how many days you will go skiing, you can decide your minimum cost. For example, if you will be skiing for more than 10 days it will be cheaper to buy skis, while if you will be skiing for fewer than 10 days it will be cheaper to rent. (If you will ski for exactly 10 days you are indifferent.) The question is what to do when you do not know in advance how many days you will ski.
Formally, the problem can be set up as follows. There is a number of days d (unknown to you) that you will ski. We are looking for an algorithm that minimizes the ratio between what you pay using the algorithm (that does not know d) and what you would pay optimally if you knew d in advance. The problem is generally analyzed in the worst case, where the algorithm is fixed and then we look at the worst-case performance of the algorithm over all possible d. In particular, no assumptions are made regarding the distribution of d (and it is easy to see that, with knowledge of the distribution of d, a different analysis as well as different solutions would be preferred).
The break-even algorithm
The break-even algorithm instructs you to rent for 9 days and buy skis on the morning of day 10 if you are still up for skiing.[4] If you have to stop skiing during the first 9 days, it costs the same as what you would pay if you had known the number of days you would go skiing. If you have to stop skiing after day 10, your cost is $19 which is 90% more than what you would pay if you had known the number of days you would go skiing in advance. This is the worst case for the break-even algorithm.
The break-even algorithm is known to be the best deterministic algorithm for this problem.
Can you do better than break-even?
Yes. For example, you can flip a coin. If it comes up head, you buy skis on day 8; otherwise, you buy skis on day 10. This is an instance of a randomized algorithm. It is easy to see that the expected cost is at most 80% more than what you would pay if you had known the number of days you would go skiing, regardless of how many days you ski. In particular, if you ski for 10 days, then your expected cost is 1/2 [7 +10] + 1/2 [9+10] = 18 dollars, only 80% excess instead of 90%.
The best randomized algorithm against an oblivious adversary is to choose some day i at random according to the following distribution p, rent for i-1 days and buy skis on the morning of day i if you are still up for skiing. Karlin et al.[2] first presented this algorithm with distribution where buying skis costs $b and renting costs $1. Its expected cost is at most e/(e-1) 1.58 times what you would pay if you had known the number of days you would go skiing. No randomized algorithm can do better.
Applications
Snoopy caching:[2] several caches share the same memory space that is partitioned into blocks. When a cache writes to a block, caches that share the block spend 1 bus cycle to get updated. These caches can invalidate the block to avoid the cost of updating. But there is a penalty of p bus cycles for invalidating a block from a cache that shortly thereafter needs access to it. We can break the write request sequences for several caches into request sequences for two caches. One cache performs a sequence of write operations to the block. The other cache needs to decide whether to get updated by paying 1 bus cycle per operation or invalidate the block by paying p bus cycles for future read request of itself. The two cache, one block snoopy caching problem is just the ski rental problem.
TCP acknowledgment:[5] A stream of packets arrive at a destination and are required by the TCP protocol to be acknowledged upon arrival. However, we can use a single acknowledgment packet to simultaneously acknowledge multiple outstanding packets, thereby reducing the overhead of the acknowledgments. On the other hand, delaying acknowledgments too much can interfere with the TCP's congestion control mechanisms, and thus we should not allow the latency between a packet's arrival time and the time at which the acknowledgment is sent to increase too much. Karlin et al.[6] described a one-parameter family of inputs, called the basis inputs, and showed that when restricted to these basis inputs, the TCP acknowledgement problem behaves the same as the ski rental problem.
Total completion time scheduling:[1] We wish to schedule jobs with fixed processing times on m identical machines. The processing time of job j is pj. Each job becomes known to the scheduler on its release time rj. The goal is to minimize the sum of completion times over all jobs. A simplified problem is one single machine with the following input: at time 0, a job with processing time 1 arrives; k jobs with processing time 0 arrive at some unknown time. We need to choose a start time for the first job. Waiting incurs a cost of 1 per time unit, yet starting the first job before the later k jobs may incur an extra cost of k in the worst case. This simplified problem may be viewed as a continuous version of the ski rental problem.
Refactoring versus working with a poor design: In software development, engineers have to choose between the friction and risk of errors of working with an overly-complex design and reducing the complexity of the design before making a change. The extra cost of each change with the old design is the "rental" cost, the cost of refactoring is the "buy" cost. "How many times do you work with a poor design before cleaning it up?" is a ski rental problem.
See also
References
- ↑ 1.0 1.1 Steven S. Seiden. A guessing game and randomized online algorithms. Annual ACM Symposium on Theory of Computing, 2000. http://portal.acm.org/citation.cfm?id=335385
- ↑ 2.0 2.1 2.2 A. R. Karlin, M. S. Manasse, L. A. McGeoch, and S. Owicki. Competitive randomized algorithms for non-uniform problems. In Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA, 22–24 January 1990, pp. 301-309. Also in Algorithmica, 11(6): 542-571, 1994. http://theory.lcs.mit.edu/classes/6.895/fall03/handouts/papers/karlin.pdf
- ↑ Claire Mathieu. Brown University. Lecture note: http://www.cs.brown.edu/~claire/Talks/skirental.pdf
- ↑ A. R. Karlin, M. Manasse, L. Rudolph and D. Sleator. Competitive snoopy caching. Algorithmica, 3(1): 79-119, 1988
- ↑ D. R. Dooly, S. A. Goldman and S. D. Scott. TCP dynamic acknowledgment delay: theory and practice. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing (STOC), Dallas, TX, pp. 389-398, 1998.
- ↑ Anna R. Karlin and Claire Kenyon and Dana Randall. Dynamic TCP acknowledgement and other stories about e/(e-1). Thirty-Third Annual ACM Symposium on Theory of Computing (STOC), 2001. Algorithmica. http://www.cs.brown.edu/people/claire/Publis/ACKpaper.ps