TCP congestion-avoidance algorithm
Transmission Control Protocol (TCP) uses a network congestion-avoidance algorithm that includes various aspects of an additive increase/multiplicative decrease (AIMD) scheme, with other schemes such as slow-start to achieve congestion avoidance.
The TCP congestion-avoidance algorithm is the primary basis for congestion control in the Internet.[1][2][3][4][5]
Naming history
Two such variations are those offered by TCP Tahoe and Reno. The two algorithms were retrospectively named after the 4.3BSD operating system in which each first appeared (which were themselves named after Lake Tahoe and the nearby city of Reno, Nevada). The "Tahoe" algorithm first appeared in 4.3BSD-Tahoe (which was made to support the CCI Power 6/32 “Tahoe” minicomputer), and was made available to non-AT&T licensees as part of the “4.3BSD Networking Release 1”; this ensured its wide distribution and implementation. Improvements, described below, were made in 4.3BSD-Reno and subsequently released to the public as "Networking Release 2" and later 4.4BSD-Lite. The "TCP Foo" names for the algorithms appear to have originated in a 1996 paper by Kevin Fall and Sally Floyd.[6]
TCP Tahoe and Reno
To avoid congestion collapse, TCP uses a multi-faceted congestion-control strategy. For each connection, TCP maintains a congestion window, limiting the total number of unacknowledged packets that may be in transit end-to-end. This is somewhat analogous to TCP's sliding window used for flow control. TCP uses a mechanism called slow start[7] to increase the congestion window after a connection is initialized and after a timeout. It starts with a window of two times the maximum segment size (MSS). Although the initial rate is low, the rate of increase is very rapid: for every packet acknowledged, the congestion window increases by 1 MSS so that the congestion window effectively doubles for every round-trip time (RTT). When the congestion window exceeds the ssthresh threshold, the algorithm enters a new state, called congestion avoidance. In some implementations (e.g., Linux), the initial ssthresh is large, and so the first slow start usually ends after a loss. However, ssthresh is updated at the end of each slow start, and will often affect subsequent slow starts triggered by timeouts.
Congestion avoidance: As long as non-duplicate ACKs are received, the congestion window is additively increased by one MSS every round trip time. When a packet is lost, the likelihood of duplicate ACKs being received is very high (it's possible though unlikely that the stream just underwent extreme packet reordering, which would also prompt duplicate ACKs). The behavior of Tahoe and Reno differ in how they detect and react to packet loss:
- Tahoe: Common Tahoe implementations detect congestion only by setting a timer for receiving a related ACK. Tahoe sets the slow start threshold to half of the current congestion window, reduces the congestion window to 1 MSS, and resets to slow-start state.[8]
- Reno: If three duplicate ACKs are received (i.e., four ACKs acknowledging the same packet, which are not piggybacked on data, and do not change the receiver's advertised window), Reno will halve the congestion window (instead of setting it to 1 MSS like Tahoe), set the slow start threshold equal to the new congestion window, perform a fast retransmit, and enter a phase called Fast Recovery. If an ACK times out, slow start is used as it is with Tahoe.[8]
The two main differences between Tahoe and Reno are:
- Tahoe only uses a timeout for detecting congestion, while Reno uses timeout and Fast-Retransmit
- Tahoe sets the congestion window to 1 after packet loss, while Reno sets it to half of the latest congestion window.
Fast Recovery (Reno only): In this state, TCP retransmits the missing packet that was signaled by three duplicate ACKs, and waits for an acknowledgment of the entire transmit window before returning to congestion avoidance. If there is no acknowledgment, TCP Reno experiences a timeout and enters the slow-start state.
Both algorithms reduce congestion window to 1 MSS on a timeout event.
TCP Vegas
Until the mid-1990s, all of TCP's set timeouts and measured round-trip delays were based upon only the last transmitted packet in the transmit buffer. University of Arizona researchers Larry Peterson and Lawrence Brakmo introduced TCP Vegas, named after the largest Nevada city, in which timeouts were set and round-trip delays were measured for every packet in the transmit buffer. In addition, TCP Vegas uses additive increases in the congestion window. This variant was not widely deployed outside Peterson's laboratory. In a comparison study of various TCP congestion control algorithms, TCP Vegas appeared to be the smoothest followed by TCP CUBIC.[9]
TCP Vegas was deployed as the default congestion control method for DD-WRT firmware v24 SP2.[10]
TCP New Reno
TCP New Reno, defined by RFC 6582 (which obsoletes previous definitions in RFC 3782 and RFC 2582), improves retransmission during the fast-recovery phase of TCP Reno. During fast recovery, for every duplicate ACK that is returned to TCP New Reno, a new unsent packet from the end of the congestion window is sent, to keep the transmit window full. For every ACK that makes partial progress in the sequence space, the sender assumes that the ACK points to a new hole, and the next packet beyond the ACKed sequence number is sent.
Because the timeout timer is reset whenever there is progress in the transmit buffer, this allows New Reno to fill large holes, or multiple holes, in the sequence space – much like TCP SACK. Because New Reno can send new packets at the end of the congestion window during fast recovery, high throughput is maintained during the hole-filling process, even when there are multiple holes, of multiple packets each. When TCP enters fast recovery it records the highest outstanding unacknowledged packet sequence number. When this sequence number is acknowledged, TCP returns to the congestion avoidance state.
A problem occurs with New Reno when there are no packet losses but instead, packets are reordered by more than 3 packet sequence numbers. When this happens, New Reno mistakenly enters fast recovery, but when the reordered packet is delivered, ACK sequence-number progress occurs and from there until the end of fast recovery, every bit of sequence-number progress produces a duplicate and needless retransmission that is immediately ACKed.
New Reno performs as well as SACK at low packet error rates, and substantially outperforms Reno at high error rates.
TCP Hybla
TCP Hybla[11] aims to eliminate penalization of TCP connections that incorporate a high-latency terrestrial or satellite radio link, due to their longer round trip times. It stems from an analytical evaluation of the congestion window dynamics, which suggests the necessary modifications to remove the performance dependence on RTT.
TCP BIC
Binary Increase Congestion control is an implementation of TCP with an optimized congestion control algorithm for high speed networks with high latency (called LFN, long fat networks, in RFC 1072[12]). BIC is used by default in Linux kernels 2.6.8 through 2.6.18.
TCP CUBIC
CUBIC is a less aggressive and more systematic derivative of BIC, in which the window is a cubic function of time since the last congestion event, with the inflection point set to the window prior to the event. CUBIC is used by default in Linux kernels from version 2.6.19 to 3.1.
Agile-SD TCP
Agile-SD is a Linux-based Congestion Control Algorithm (CCA) which is designed for the real Linux kernel. It is a receiver-side algorithm employs a loss-based approach using a novel mechanism, called Agility Factor (AF). It has been proposed by Mohamed A. Alrshah et. al. to increase the bandwidth utilization over high-speed and short-distance networks (low-BDP networks) such as local area networks or fiber-optic network, especially when the applied buffer size is small. It has been evaluated by comparing its performance to Compound-TCP (the default CCA in MS Windows) and CUBIC (the default of Linux) using NS-2 simulator. It improves the total performance up to 55% in term of average throughput.
TCP Westwood+
Westwood+ is a sender-side only modification of the TCP Reno protocol stack that optimizes the performance of TCP congestion control over both wireline and wireless networks. TCP Westwood+ is based on end-to-end bandwidth estimation to set congestion window and slow start threshold after a congestion episode, that is, after three duplicate acknowledgments or a timeout. The bandwidth is estimated by properly low-pass filtering the rate of returning acknowledgment packets. The rationale of this strategy is simple: in contrast with TCP Reno, which blindly halves the congestion window after three duplicate ACKs, TCP Westwood+ adaptively sets a slow start threshold and a congestion window which takes into account the bandwidth used at the time congestion is experienced. TCP Westwood+ significantly increases throughput over wireless links and fairness compared to TCP Reno/New Reno in wired networks.
Compound TCP
Compound TCP is a Microsoft implementation of TCP which maintains two different congestion windows simultaneously, with the goal of achieving good performance on LFNs while not impairing fairness. It has been widely deployed in Windows versions since Microsoft Windows Vista and Windows Server 2008 and has been ported to older Microsoft Windows versions as well as Linux.
TCP Proportional Rate Reduction
TCP Proportional Rate Reduction (PRR)[13] is an algorithm designed to improve the accuracy of data sent during recovery. The algorithm ensures that the window size after recovery is as close as possible to the slow-start threshold. In tests performed by Google, PRR resulted in a 3–10% reduction in average latency and recovery timeouts reduced by 5%.[14] PRR is used by default in Linux kernels since version 3.2.[15]
Other TCP congestion avoidance algorithms
- FAST TCP
- H-TCP
- Data Center TCP
- High Speed TCP
- HSTCP-LP[16]
- TCP-Illinois
- TCP-Ghergani
- TCP-LP[16]
- TCP SACK
- Scalable TCP[17]
- TCP Veno
- Westwood
- XCP[18]
- YeAH-TCP[19]
- TCP-FIT[20]
- Congestion Avoidance with Normalized Interval of Time (CANIT)[21]
- Non-linear neural network congestion control based on genetic algorithm for TCP/IP networks[22]
TCP New Reno was the most commonly implemented algorithm, SACK support is very common and is an extension to Reno/New Reno. Most others are competing proposals which still need evaluation. Starting with 2.6.8 the Linux kernel switched the default implementation from New Reno to BIC. The default implementation was again changed to CUBIC in the 2.6.19 version. FreeBSD uses New Reno as the default algorithm. However, it supports a number of other choices.[23]
When the per-flow product of bandwidth and latency increases, regardless of the queuing scheme, TCP becomes inefficient and prone to instability. This becomes increasingly important as the Internet evolves to incorporate very high-bandwidth optical links.
TCP Interactive (iTCP)[24] allows applications to subscribe to TCP events and respond accordingly enabling various functional extensions to TCP from outside TCP layer. Most TCP congestion schemes work internally. iTCP additionally enables advanced applications to directly participate in congestion control such as to control the source generation rate.
Zeta-TCP detects the congestions from both the latency and loss rate measures, and applies different congestion window backoff strategies based on the likelihood of the congestions to maximize the goodput. It also has a couple of other improvements to accurately detect the packet losses, avoiding retransmission timeout retransmission; and accelerate/control the inbound (download) traffic.
Usage
- BIC is used by default in Linux kernels 2.6.8 through 2.6.18. (August 2004 – September 2006)
- CUBIC is used by default in Linux kernels from version 2.6.19 to 3.1. (November 2006 – October 2011)
- PRR is used by default in Linux kernels since version 3.2. (January 2012)
See also
- Development of TCP
- Avoidance of network congestion
- Explicit Congestion Notification (ECN)
- Bufferbloat
References
- ↑ Van Jacobson, Michael J. Karels. Congestion Avoidance and Control (1988). Proceedings of the Sigcomm '88 Symposium, vol.18(4): pp.314–329. Stanford, CA. August 1988. This paper originated many of the congestion avoidance algorithms used in TCP/IP.
- ↑ RFC 2001 – TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms
- ↑ RFC 5681 – TCP Congestion Control
- ↑ RFC 3390 – TCP Increasing TCP's Initial Window
- ↑ TCP Congestion Avoidance Explained via a Sequence Diagram
- ↑ Fall, Kevin; Sally Floyd (July 1996). "Simulation-based Comparisons of Tahoe, Reno and SACK TCP" (PostScript). Computer Communications Review.
- ↑ Jacobson, Van; Karels, Michael (1988). "Congestion Avoidance and Control" (PDF). ACM SIGCOMM Computer Communication Review 25 (1): 157–187. doi:10.1145/205447.205462.
- 1 2 Kurose & Ross 2008, p. 284.
- ↑ "Performance Analysis of TCP Congestion Control Algorithms" (PDF). Retrieved 26 March 2012.
- ↑ "DD-WRT changelog". Retrieved 2 January 2012.
- ↑ http://hybla.deis.unibo.it/
- ↑
- ↑ "RFC 6937 - Proportional Rate Reduction for TCP". Retrieved 6 June 2014.
- ↑ Corbet, Jonathan. "LPC: Making the net go faster". Retrieved 6 June 2014.
- ↑ "Linux 3.2 - Linux Kernel Newbies". Retrieved 6 June 2014.
- 1 2 "Rice Networks Group".
- ↑ http://www.deneholme.net/tom/scalable/
- ↑ "XCP @ ISI".
- ↑ http://wil.cs.caltech.edu/pfldnet2007/paper/YeAH_TCP.pdf
- ↑ http://media.cs.tsinghua.edu.cn/~multimedia/tcp-fit/
- ↑ "An analytical study of CANIT algorithm in TCP protocol". doi:10.1145/605521.605530.
- ↑ "Error:".
- ↑ "Summary of Five New TCP Congestion Control Algorithms Project".
- ↑ "iTCP - Interactive Transport Protocol - Medianet Lab, Kent State University".
Sources
- Kurose, James; Ross, Keith (2008). Computer Networking – A Top-Down Approach (4th ed.). Addison Wesley. ISBN 978-0-13-607967-5.
- Afanasyev, A.; N. Tilley; P. Reiher; L. Kleinrock (2010). "Host-to-host congestion control for TCP" (PDF). IEEE Communication Surveys and Tutorials 12 (3).
External links
- Congestion avoidance simulation
- Linux kernel support for defining a per-route/destination congestion control algorithm (merged in Linux kernel 4.0)