TCP global synchronization

TCP global synchronization in computer networks can happen to TCP/IP flows during periods of congestion because each sender will reduce their transmission rate at the same time when packet loss occurs.

Routers on the Internet normally have packet queues, to allow them to hold packets when the network is busy, rather than discarding them.

Because routers have limited resources, the size of these queues is also limited. The simplest technique to limit queue size is known as tail drop. The queue is allowed to fill to its maximum size, and then any new packets are simply discarded, until there is space in the queue again.

This causes problems when used on TCP/IP routers handling multiple TCP streams, especially when bursty traffic is present. While the network is stable, the queue is constantly full, and there are no problems except that the full queue results in high latency. However, the introduction of a sudden burst of traffic may cause large numbers of established, steady streams to lose packets simultaneously.

TCP has automatic recovery from dropped packets, which it interprets as congestion on the network (which is usually correct). The sender reduces its sending rate for a certain amount of time, and then tries to find out if the network is no longer congested by increasing the rate again subject to a ramp-up. This is known as the slow-start algorithm.

Almost all the senders will use the same time delay before increasing their rates. When these delays expire, at the same time, all the senders will send additional packets, the router queue will again overflow, more packets will be dropped, the senders will all back off for a fixed delay... ad infinitum.

This pattern of each sender decreasing and increasing transmission rates at the same time as other senders is referred to as "global synchronization" and leads to inefficient use of bandwidth, due to the large numbers of dropped packets, which must be retransmitted, and because the senders have a reduced sending rate, compared to the stable state, while they are backed-off, following each loss.

This problem has been the subject of much research. The consensus appears to be that the tail drop algorithm is the leading cause of the problem, and other queue size management algorithms such as Random Early Detection (RED) and Weighted RED will reduce the likelihood of global synchronization, as well as keeping queue sizes down in the face of heavy load and bursty traffic.

See also

References

External links