Bufferbloat

Bufferbloat is a phenomenon in packet-switched networks, in which excess buffering of packets causes high latency and packet delay variation (also known as jitter), as well as reducing the overall network throughput. When a router device is configured to use excessively large buffers, even very high-speed networks can become practically unusable for many interactive applications like voice calls, chat, and even web surfing.

Bufferbloat phenomenon was initially described as far back as in 1985,[1] and gradually became more recognized as an issue. It gained more widespread attention starting in 2009.[2][3][4]

Overly large buffers have been placed in some models of equipment by their manufacturers. In this equipment bufferbloat occurs when a network link becomes congested, causing packets to become queued for too long in those buffers. In a first-in first-out queuing system, overly large buffers result in longer queues and higher latency, but do not improve network throughput and may even reduce goodput to zero in extreme cases.

Buffering

Bufferbloat as an issue is caused mainly by router and switch manufacturers making incorrect assumptions and buffering packets for too long in cases where they should be dropped, in an attempt to keep a congested link as busy as possible.

The general rule-of-thumb for the network equipment manufacturers was to provide buffers large enough to accommodate 250 ms (or more) worth of traffic passing through a device. For example, that way a 1 Gbit/s Ethernet interface within a router requires a huge 32 MB buffer.[5] Such sizing of the buffers can lead to TCP's congestion-avoidance algorithms breaking, causing problems such as high and variable latency, and choking network bottlenecks for all other flows as the buffer becomes full of the packets of one TCP stream and other packets are then dropped.[6] The buffers then take some time to drain, before the TCP connection ramps back up to speed and then floods the buffers again.[7]

A bloated buffer has an effect only when this buffer is actually used. In other words, over-sized buffers have a damaging effect only when the link they buffer for becomes a bottleneck. When the current bottleneck on the route from/to another host is not contended then it is easy to tell if it's bloated or not using just the ping utility provided by most operating systems. First, the other host should be pinged continuously. Then a several seconds long download from it should be started and stopped a few times. By design, the TCP congestion avoidance algorithm rapidly fills up the bottleneck on the route. If downloading (resp. uploading) correlates with a direct and important increase of the round trip time reported by ping, then it proves that the buffer of the current bottleneck in the download (resp. upload) direction is bloated. Since the increase of the round trip time is caused by the buffer on the bottleneck, the maximum increase gives a rough estimation of its size in milliseconds.[8]

In the previous example, using an advanced traceroute tool (for example MTR)instead of the simple pingingwill not just demonstrate the existence of a bloated buffer on the bottleneck but will also pinpoint its location in the network. Traceroute achieves this by displaying the route (path) and measuring transit delays of packets across the network. The history of the route is recorded as round-trip times of the packets received from each successive host (remote node) in the route (path).[9]

Mechanism

See also: TCP window size and TCP slow start

The TCP congestion avoidance algorithm relies on packet drops to determine the available bandwidth. It speeds up the data transfer until packets start to drop, then slows down the transmission rate. Ideally it keeps speeding up and slowing down the transmission rate, until it finds an equilibrium to the speed of the link. However, for this to work the packet drops must occur in a timely manner, so that the algorithm can select a suitable transfer speed. With a large buffer that has been filled, the packets will arrive at their destination, but with a higher latency. The packet is not dropped, so TCP does not slow down once the uplink has been saturated, further filling the buffer. Newly arriving packets are dropped only when the buffer is fully saturated. TCP may even decide that the path of the connection has changed, and again go into the more aggressive search for a new operating point.[10]

Packets are queued within a network buffer before being transmitted; in problematic situations, packets are dropped only if the buffer is full. On older routers, buffers were fairly small so they filled quickly and therefore packets began to drop shortly after the link became saturated, so the TCP protocol could adjust, and the issue would not become apparent. On newer routers, buffers have become large enough to hold several megabytes of data, which translates to time amounts in seconds required for emptying the buffers. This causes the TCP algorithm that shares bandwidth on a link to react very slowly as its behavior depends on actually having packets dropped when the transmission channel becomes saturated.

The problem also affects other protocols. All packets passing through a simple buffer implemented as a single queue will experience the same delay, so the latency of any connection that passes through a filled buffer will be affected. Available channel bandwidth can also end up being unused, as some fast destinations may not be reached due to buffers clogged with data awaiting delivery to slow destinations caused by contention between simultaneous transmissions competing for some space in an already full buffer. This also reduces the interactivity of applications using other network protocols, including UDP or any other datagram protocol used in latency-sensitive applications like VoIP and games.[11] In extreme cases, bufferbloat may cause failures in essential protocols such as DNS.

Impact on applications

Any type of a service which requires consistently low latency or jitter-free transmission (whether in low or high traffic bandwidths) can be severely affected, or even rendered unusable by the effects of bufferbloat. Examples are voice calls (Voice over IP), networked gaming, video chat programs, and other interactive applications such as instant messaging and remote login. Latency has been identified as more important than raw bandwidth for many years.[12]

When the bufferbloat is present and the network is under load, even normal web page loads can take many seconds to complete, or simple DNS queries can fail due to timeouts.[13]

Diagnostic tools

The ICSI Netalyzr[14] is an on-line tool that can be used for checking networks for the presence of bufferbloat, together with checking for many other common configuration problems.[15] The CeroWrt project also provides an easy procedure for determining whether a connection has excess buffering that will slow it down.[16]

Mitigations

The problem may be mitigated by reducing the buffer size on the OS[13] and network hardware; however, this is not configurable on most home routers, broadband equipment and switches, nor even feasible in today's broadband and wireless systems.[13] Some other mitigation approaches are also available:

Network scheduler

Main article: Network scheduler

The network scheduler arbiter program that manages the sequence of network packets. It has been successfully used to significantly mitigate the bufferbloat phenomenon when employing the CoDel or the Fair Queue CoDel queuing discipline, because these algorithms drop at the head.

There are several other queuing disciplines available for active queue management, used in general for traffic shaping, but none of them fundamentally changes the situation, as although HTTP and VoIP may be buffered independently, each buffer will still be independently susceptible to bufferbloat. In practice though this may help mitigate,[13] for example as a result of one large buffer being split into multiple smaller buffers, or isolation of bufferbloat queues combined with prioritisation.

See also

References

  1. "On Packet Switches With Infinite Storage". 1985-12-31.
  2. Brough Turner (2009-10-25). "Has AT&T Wireless data congestion been self-inflicted?". Brough Turner blog. Retrieved 2012-02-28.
  3. "The criminal mastermind: bufferbloat! « jg's Ramblings". Gettys.wordpress.com. 2010-12-03. Retrieved 2011-07-05.
  4. Iljitsch van Beijnum (2011-01-07). "Understanding bufferbloat and the network buffer arms race". Ars Technica. Retrieved 2011-11-12.
  5. Guido Appenzeller; Isaac Keslassy; Nick McKeown (2004). "Sizing Router Buffers" (PDF). ACM SIGCOMM. ACM. Retrieved 2013-10-15.
  6. Gettys, Jim (May–June 2011). "Bufferbloat: Dark Buffers in the Internet". IEEE Internet Computing 15 (3). IEEE. pp. 95–96. doi:10.1109/MIC.2011.56. Retrieved 2012-02-20.
  7. Nichols, Kathleen; Jacobson, Van (2012-05-06). "Controlling Queue Delay". ACM Queue. ACM Publishing. Retrieved 2013-09-27.
  8. Clunis, Andrew (2013-01-22). "Bufferbloat demystified". Retrieved 2013-09-27.
  9. "traceroute(8) - Linux man page". die.net. Retrieved 2013-09-27.
  10. Jacobson, Van; Karels, MJ (1988). "Congestion avoidance and control". ACM SIGCOMM Computer Communication Review 18 (4).
  11. "Technical Introduction to Bufferbloat". bufferbloat.net. Retrieved 2013-09-27.
  12. "It's the Latency, Stupid". Rescomp.stanford.edu. Retrieved 2011-07-05.
  13. 13.0 13.1 13.2 13.3 13.4 13.5 13.6 Gettys, Jim; Nichols, Kathleen (January 2012). "Bufferbloat: Dark Buffers in the Internet". Communications of the ACM 55 (1). ACM. pp. 57–65. doi:10.1145/2063176.2063196. Retrieved 2012-02-28.
  14. "ICSI Netalyzr". berkeley.edu. Retrieved 30 January 2015.
  15. Gettys, Jim. "Diagnosing Bufferbloat". gettys.wordpress.com. Retrieved 2012-03-03.
  16. "Cerowrt: Quick Test for Bufferbloat". bufferbloat.net. Retrieved 30 January 2015.
  17. "DOCSIS "Upstream Buffer Control" feature". CableLabs. pp. 554–556. Retrieved 2012-08-09.

External links