Congestive collapse

From Wikipedia, the free encyclopedia

Congestive collapse (or congestion collapse) is a condition a packet switched computer network can get into when congestion in the network is bad enough that almost no useful communication is happening.

When a network is in such a condition, it has settled (under overload) into a stable state where traffic demand is high but little useful throughput is available, and there are high levels of packet delay and loss (caused by routers discarding packets because their output queues are too full).

Contents

[edit] History

Congestion collapse was identified as a possible problem as far back as 1984 (RFC 896, dated 6 January). It was first observed on the early internet in October 1986, when the NSFnet phase-I backbone dropped three orders of magnitude from its capacity of 32kbps to 40bps, and continued until end nodes started implementing Van Jacobson's congestion control between 1987 and 1988.

[edit] Cause

Analysis showed that the 1986 collapse was caused by details of the protocols used. When more packets were sent than could be handled by intermediate routers, the intermediate routers discarded many packets, expecting the end points of the network to retransmit the information. However, early TCP implementations had very bad retransmission behavior. When this packet loss occurred, the end points sent extra packets that repeated the information lost; doubling the data rate sent, exactly the opposite of what should be done during congestion. This pushed the entire network into a 'congestion collapse' where most packets were lost and the resultant throughput was negligible.

Congestion collapse generally occurs at choke points in the network, where the total incoming bandwidth to a node exceeds the outgoing bandwidth. Connection points between a local area network and a wide area network are the most likely choke points. A DSL modem is the most common small network example, with between 10 and 1000 Megabits of incoming bandwidth and at most 8 Megabits of outgoing bandwidth.

[edit] Avoidance

Main article: Congestion Control

The prevention of congestion collapse requires two major components - a mechanism in routers to reorder or drop packets under overload, and end-to-end flow control mechanisms designed into the end points which are responsive to congestion and behave appropriately.

The correct end point behaviour is usually still to repeat dropped information, but progressively slow the rate that information is repeated. Provided all end points do this, the congestion lifts and good use of the network occurs, and the end points all get a fair share of the available bandwidth. Other strategies such as 'slow start' ensure that new connections don't overwhelm the router before the congestion detection can kick in.

The most common router mechanisms used to prevent congestive collapses are fair queueing in its various forms, and random early detection, or RED, where packets are randomly dropped before congestion collapse actually occurs, triggering the end points to slow transmission more progressively. Fair queueing is most useful in routers at choke points with a small number of connections passing through them. Larger routers must rely on RED.

Some end-to-end protocols are better behaved under congested conditions than others. TCP is perhaps the best behaved. The first TCP implementations to handle congestion well were developed in 1984[citation needed], but it was not until Van Jacobson's inclusion of an open source solution in Berkeley UNIX ("BSD") in 1988 that good TCP implementations became widespread.

UDP does not, in itself, have any congestion control mechanism at all. Protocols built atop UDP must handle congestion in their own way. Protocols atop UDP which transmit at a fixed rate, independent of congestion, can be troublesome. Real-time streaming protocols, including many Voice over IP protocols, have this property. Thus, special measures, such as quality-of-service routing, must be taken to keep packets from being dropped from streams.

In general, congestion in pure datagram networks must be kept out at the periphery of the network, where the mechanisms described above can handle it. Congestion in the Internet backbone is very difficult to deal with. Fortunately, cheap fiber-optic lines have reduced costs in the Internet backbone. The backbone can thus be provisioned with enough bandwidth to (usually) keep congestion at the periphery.

[edit] Side effects of congestive collapse avoidance

The protocols that avoid congestive collapse are based on the idea that essentially all data loss on the internet is caused by congestion. This is true in nearly all cases; errors during transmission are rare on today's fiber based internet. However, this causes WiFi networks to have poor throughput in some cases since wireless networks are susceptible to data loss. The TCP connections running over WiFi see the dataloss and tend to believe that congestion is occurring when it isn't and erroneously reduce the data rate sent.

The slow start protocol performs badly for short lived connections. Unfortunately web browsers historically used to create many independent short-lived connections to the web server, opening and closing the connection for each html file, and each picture file independently. This meant that most connections never left the slow start regime and poor responsiveness is the result. Modern browsers either open multiple connections simultaneously, or, better, reuse one connection for all the files that are on a particular web server.

[edit] References

  • RFC 2914 - Congestion Control Principles, Sally Floyd, September, 2000
  • RFC 896 - "Congestion Control in IP/TCP", John Nagle, 6 January, 1984
  • Introduction to Congestion Avoidance and Control, Van Jacobson and Michael J. Karels, November, 1988

[edit] See also