Congestive collapse
From Wikipedia, the free encyclopedia
Congestive collapse (or congestion collapse) is the name of a condition a packet switched computer network can get into when congestion in the network is so bad that almost no useful communication is happening.
To put it another way, congestion collapse is an extended period of congestion during which the network performs little or no useful work. When a network is in such a condition, it has settled (under overload) into a stable state where traffic demand is high but little useful throughput is available, and there are high levels of packet delay and loss (caused by routers discarding packets because their output queues are too long).
Congestion collapse was first found on the early internet- a network suddenly began running orders of magnitude slower than it should have been capable of. Analysis showed that it was caused by details of the protocols used. When too many packets were sent than could be handled by intermediate routers, the intermediate routers discarded packets, expecting the end points of the network to retransmit the information. Early TCP implementations had very bad retransmission behavior. When this packet loss occurred, the end points sent extra packets that repeated the information lost; doubling the data rate sent, exactly the opposite of what should be done. This pushed the entire network into a 'congestive collapse' where most packets were lost and the resultant throughput was negligible.
Congestion collapse generally occurs at choke points in the network, where the incoming bandwidth to a node exceeds the outgoing bandwidth. Connection points between a local area network and a wide area network are the most likely choke points. A DSL modem is the most common small network example, with between 10 and 1000 Megabits of incoming bandwidth and at most 8 Megabits of outgoing bandwidth.
The prevention of congestion collapse requires two major components - a mechanism in routers to reorder or drop packets under overload, and end-to-end flow control mechanisms designed into the end points which are responsive to congestion and behave appropriately.
The correct end point behaviour is usually still to repeat dropped information, but progressively slow the rate that information is repeated. Provided all end points do this, the congestion lifts and good use of the network occurs, and the end points all get a fair share of the available bandwidth. Other strategies such as 'slow start' ensure that new connections don't overwhelm the router before the congestion detection can kick in.
The most common router mechanisms used to prevent congestive collapses are fair queueing in its various forms, and random early detection, or RED, where packets are randomly dropped before congestion collapse actually occurs, triggering the end points to slow transmission more progressively. Fair queueing is most useful in routers at choke points with a small number of connections passing through them. Larger routers must rely on RED.
Some end-to-end protocols are better behaved under congested conditions than others. TCP is perhaps the best behaved. The first TCP implementations to handle congestion well were developed in 1984, but it was not until Van Jacobson's inclusion of an open source solution in Berkeley UNIX ("BSD") in 1988 that good TCP implementations became widespread.
UDP does not, in itself, have any congestion control mechanism at all. Protocols built atop UDP must handle congestion in their own way. Protocols atop UDP which transmit at a fixed rate, independent of congestion, can be troublesome. Real-time streaming protocols, including many Voice over IP protocols, have this property. Thus, special measures, such as quality-of-service routing, must be taken to keep packets from being dropped from streams.
In general, congestion in pure datagram networks must be kept out at the periphery of the network, where the mechanisms described above can handle it. Congestion in the Internet backbone is very difficult to deal with. Fortunately, cheap fiber-optic lines have reduced costs in the Internet backbone. The backbone can thus be provisioned with enough bandwidth to (usually) keep congestion at the periphery.
[edit] Side effects of congestive collapse avoidance
The protocols that avoid congestive collapse are based on the idea that essentially all data loss on the internet is caused by congestion. This is true in nearly all cases; errors during transmission are rare on today's fiber based internet. However, this causes WiFi networks to have poor throughput in some cases since wireless networks are susceptible to data loss. The TCP connections running over WiFi see the dataloss and tend to believe that congestion is occurring when it isn't and erroneously reduce the data rate sent.
The slow start protocol performs badly for short lived connections. Unfortunately web browsers historically used to create many independent short-lived connections to the web server, opening and closing the connection for each html file, and each picture file independently. This meant that most connections never left the slow start regime and poor responsiveness is the result. Modern browsers either open multiple connections simultaneously, or, better, reuse one connection for all the files that are on a particular web server.
[edit] References
RFC 2914: Congestion Control Principles