Cache stampede

A cache stampede is a type of cascading failure that can occur when massively parallel computing systems with caching mechanisms come under very high load. This behaviour is sometimes also called dog-piling.[1][2]

To understand how cache stampedes occur, consider a web server which uses memcached to cache rendered pages for some period of time, to ease system load. Under particularly high load to a single URL, the system remains responsive so long as the resource remains cached, with requests being handled by accessing the cached copy. This minimizes, the expensive rendering operation.

Under low load, cache misses result in a single recalculation of the rendering operation. The system will continue as before, with average load being kept very low because of the high cache hit rate.

However, under very heavy load, when the cached version of that page expires, there may be sufficient concurrency in the server farm that multiple threads of execution will all attempt to render the content of that page simultaneously. Systematically, none of the concurrent servers know that the others are doing the same rendering at the same time. If sufficiently high load is present, this may by itself be enough to bring about congestion collapse of the system via exhausting shared resources. Congestion collapse results in preventing the page from ever being completely re-rendered and re-cached, as every attempt to do so times out. Thus reducing the cache hit rate to zero, and keeping the system continuously in congestion collapse as it attempts to regenerate the resource for as long as the load remains at the very heavy load.

References

  1. Galbraith, Patrick (2009), Developing Web Applications with Apache, MySQL, memcached, and Perl, John Wiley & Sons, p. 353, ISBN 9780470538326.
  2. Allspaw, John; Robbins, Jesse (2010), Web Operations: Keeping the Data On Time, O'Reilly Media, pp. 128–132, ISBN 9781449394158.

External links