Five-minute rule

From Wikipedia, the free encyclopedia

In computer science, the five-minute rule is a rule of thumb for deciding whether a data item should be kept in memory, or stored on disk and read back into memory when required. It was first formulated by Jim Gray and G. F. Putzolu in 1985,[1][2] and then subsequently revised in 1997[3] and 2007[4] to reflect changes in the relative cost and performance of memory and persistent storage.

The rule is as follows:

The 5-minute random rule: cache randomly accessed disk pages that are re-used every 5 minutes or less.

Gray also issue a counterpart one-minute rule for sequential access:[5]

The 1-minute rule: cache sequentially accessed disk pages that are re-used every 1 minute or less.

Although the 5-minute rule was invented in the realm of databases, it has also been applied elsewhere, for example, in Network File System cache capacity planning.[6]

The original 5-minute rule was derived from the following cost-benefit computation:[4]

BreakEvenIntervalinSeconds = (PagesPerMBofRAM / AccessesPerSecondPerDisk) × (PricePerDiskDrive / PricePerMBofRAM)

Applying it to 2007 data yields approximately a 90-minutes interval for magnetic-disk-to-DRAM caching, 15 minutes for SSD-to-DRAM caching and 2¼ hours for disk-to-SSD caching. The disk-to-DRAM interval was thus a bit short of what Gray and Putzolu anticipated in 1987 as the "five-hour rule" was going to be in 2007 for RAM and disks.[4]

According to calculations by NetApp engineer David Dale as reported in The Register, the figures for disc-to-DRAM caching in 2008 were as follows: "The 50KB page break-even was five minutes, the 4KB once was one hour and the 1KB one was five hours. There needed to be a 50-fold increase in page size to cache for break-even at five minutes." Regarding disk-to-SSD caching in 2010, the same source reported that "A 250KB page break even with SLC was five minutes, but five hours with a 4KB page size. It was five minutes with a 625KB page size with MLC flash and 13 hours with a 4KB MLC page size."[7]

In 2000, Gray and Shenoy applied a similar calculation for web page caching and concluded that a browser should "cache web pages if there is any chance they will be re-referenced within their lifetime."[8]

References

  1. Gray, Jim; Putzolu, Franco (May 1985), The 5 Minute Rule for Trading Memory for Disc Accesses and the 5 Byte Rule for Trading Memory for CPU Time 
  2. Gray, Jim; Putzolu, Gianfranco R. (1987), "The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time", Proceedings of the ACM SIGMOD Conference, pp. 395–398, doi:10.1145/38713.38755 
  3. Gray, Jim; Graefe, Goetz (1997), "The Five-Minute Rule Ten Years Later, and Other Computer Storage Rules of Thumb", ACM SIGMOD Record 26 (4): 63–68, doi:10.1145/271074.271094 
  4. 4.0 4.1 4.2 Graefe, Goetz (2007), "The five-minute rule twenty years later, and how flash memory changes the rules", DaMoN '07: Proceedings of the 3rd international workshop on Data management on new hardware, pp. 1–9, doi:10.1145/1363189.1363198  Free version in ACM Queue, September 2008.
  5. René J. Chevance (2004). Server Architectures: Multiprocessors, Clusters, Parallel Systems, Web Servers, Storage Solutions. Digital Press. p. 542. ISBN 978-0-08-049229-2. 
  6. Gian-Paolo D. Musumeci; Mike Loukides (2002). System Performance Tuning. O'Reilly Media, Inc. p. 263. ISBN 978-0-596-55204-6. 
  7. http://www.theregister.co.uk/2010/05/19/flash_5_minute_rule/?page=2
  8. Jim Gray, Prashant Shenoy, "Rules of Thumb in Data Engineering", MS-TR-99-100
This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.